Intel’s Death and Potential Revival

2025-02-022025-02-02 TE 0 Comments

https://stratechery.com/2024/intels-death-and-potential-revival

In 1980 IBM, under pressure from its customers to provide computers for personal use, not just mainframes, set out to create the IBM PC; given the project’s low internal priority but high external demand they decided to outsource two critical components: Microsoft would provide the DOS operating system, which would run on the Intel 8088 processor.

Those two deals would shape the computing industry for the following 27 years. Given that the point of the personal computer was to run applications, the operating system that provided the APIs for those applications would have unassailable lock-in, leading to Microsoft’s dominance with first DOS and then Windows, which was backwards compatible.

The 8088 processor, meanwhile, was a low-cost variant of the 8086 processor; up to that point most new processors came with their own instruction set, but the 8088 and 8086 used the same instruction set, which became the foundation of Intel processors going forward. That meant that the 286 and 386 processors that followed were backwards compatible with the 8088 in the IBM PC; in other words, Intel, too, had lock-in, and not just with MS-DOS: while the majority of applications leveraged operating system-provided APIs, it was much more common at that point in computing history to leverage lower level APIs, including calling on the processor instruction set directly. This was particularly pertinent for things like drivers, which powered all of the various peripherals a PC required.

Intel’s CISC Moat

The 8086 processor that undergirded the x86 instruction set was introduced in 1978, when memory was limited, expensive, and slow; that’s why the x86 used Complex Instruction Set Computing (CISC), which combined multiple steps into a single instruction. The price of this complexity was the necessity of microcode, dedicated logic that translated CISC instructions into its component steps so they could actually be executed.

The same year that IBM cut those deals, however, was the year that David Patterson and a team at Berkeley started work on what became known as the RISC-1 processor, which took an entirely different approach: Reduced Instruction Set Computing (RISC) replaced the microcode-focused transistors with registers, i.e. memory that operated at the same speed as the processor itself, and filled them with simple instructions that corresponded directly to transistor functionality. This would, in theory, allow for faster computing with the same number of transistors, but memory access was still expensive and more likely to be invoked given the greater number of instructions necessary to do anything, and programs and compilers needed to be completely reworked to take advantage of the new approach.

Intel, more than anyone, realized that this would be manageable in the long run. “Moore’s Law”, the observation that the number of transistors in an integrated circuit doubles every two years, was coined by Gordon Moore, their co-founder and second CEO; the implication for instruction sets was that increased software complexity and slow hardware would be solved through ever faster chips, and those chips could get even faster if they were simplified RISC designs. That is why most of the company wanted, in the mid-1980s, to abandon x86 and its CISC instruction set for RISC.

There was one man, however, who interpreted the Moore’s Law implications differently, and that was Pat Gelsinger; he led the development of the 486 processor and was adamant that Intel stick with CISC, as he explained in an oral history at the Computer Museum:

https://videopress.com/embed/k9yIgNfM?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadata

Gelsinger: We had a mutual friend that found out that we had Mr. CISC working as a student of Mr. RISC, the commercial versus the university, the old versus the new, teacher versus student. We had public debates of John and Pat. And Bear Stearns had a big investor conference, a couple thousand people in the audience, and there was a public debate of RISC versus CISC at the time, of John versus Pat.

And I start laying out the dogma of instruction set compatibility, architectural coherence, how software always becomes the determinant of any computer architecture being developed. “Software follows instruction set. Instruction set follows Moore’s Law. And unless you’re 10X better and John, you’re not 10X better, you’re lucky if you’re 2X better, Moore’s Law will just swamp you over time because architectural compatibility becomes so dominant in the adoption of any new computer platform.” And this is when x86– there was no server x86. There’s no clouds at this point in time. And John and I got into this big public debate and it was so popular.

Brock: So the claim wasn’t that the CISC could beat the RISC or keep up to what exactly but the other overwhelming factors would make it the winner in the end.

Gelsinger: Exactly. The argument was based on three fundamental tenets. One is that the gap was dramatically overstated and it wasn’t an asymptotic gap. There was a complexity gap associated with it but you’re going to make it leap up and that the CISC architecture could continue to benefit from Moore’s Law. And that Moore’s Law would continue to carry that forward based on simple ones, number of transistors to attack the CISC problems, frequency of transistors. You’ve got performance for free. And if that gap was in a reasonable frame, you know, if it’s less than 2x, hey, in a Moore’s Law’s term that’s less than a process generation. And the process generation is two years long. So how long does it take you to develop new software, porting operating systems, creating optimized compilers? If it’s less than five years you’re doing extraordinary in building new software systems. So if that gap is less than five years I’m going to crush you John because you cannot possibly establish a new architectural framework for which I’m not going to beat you just based on Moore’s Law, and the natural aggregation of the computer architecture benefits that I can bring in a compatible machine. And, of course, I was right and he was wrong.

Intel would, over time, create more RISC-like processors, switching out microcode for micro-ops processing units that dynamically generated RISC-like instructions from CISC-based software that maintained backwards compatibility; Gelsinger was right that no one wanted to take the time to rewrite all of the software that assumed an x86 instruction set when Intel processors were getting faster all of the time, and far out-pacing RISC alternatives thanks to Intel’s manufacturing prowess.

That, though, turned out to be Intel’s soft underbelly; while the late Intel CEO Paul Ottelini claimed that he turned down the iPhone processor contract because of price, Tony Fadell, who led the creation of the iPod and iPhone hardware, told me in a Stratechery Interview that the real issue was Intel’s obsession with performance and neglect of efficiency.

The new dimension that always came in with embedded computing was always the power element, because on battery-operated devices, you have to rethink how you do your interrupt structures, how you do your networking, how you do your memory. You have to think about so many other parameters when you think about power and doing enough processing effectively, while having long battery life. So everything for me was about long, long battery life and why do we do what we do? David Tupman was on the team, the iPod team with me, he would always say every nanocoulomb was sacred, and we would go after that and say, “Okay, where’s that next coulomb? Where are we going to go after it?” And so when you take that microscopic view of what you’re building, you look at the world very differently.

For me, when it came to Intel at the time, back in the mid-2000s, they were always about, “Well, we’ll just repackage what we have on the desktop for the laptop and then we’ll repackage that again for embedding.” It reminded me of Windows saying, “I’m going to do Windows and then I’m going to do Windows Mobile and I’m going to do Windows embedded.” It was using those same cores and kernels and trying to slim them down.

I was always going, “Look, do you see how the iPhone was created? It started with the guts of the iPod, and we grew up from very little computing, and very little space, and we grew into an iPhone, and added more layers to it.” But we weren’t taking something big and shrinking it down. We were starting from the bottom up and yeah, we were taking Mac OS and shrinking it down, but we were significantly shrinking it down. Most people don’t want to take those real hard cuts to everything because they’re too worried about compatibility. Whereas if you’re just taking pieces and not worrying about compatibility, it’s a very different way of thinking about how building and designing products happens.

This is why I was so specific with that “27 year” reference above; Apple’s 2007 launch of the iPhone marked the end of both Microsoft and Intel’s dominance, and for the same reason. The shift to efficiency as the top priority meant that you needed to rewrite everything; that, by extension, meant that Microsoft’s API and Intel’s x86 instruction set were no longer moats but millstones. On the operating side Apple stripped macOS to the bones and rebuilt it for efficiency; that became iOS, and the new foundation for apps; on the processor side Apple used processors based on the ARM instruction set, which was RISC from the beginning. Yes, that meant a lot of things had to be rewritten, but here the rewriting wasn’t happening by choice, but by necessity.

This leads, as I remarked to Fadell in that interview, to a rather sympathetic interpretation of Microsoft and Intel’s failure to capture the mobile market; neither company had a chance. They were too invested in the dominant paradigm at the time, and thus unable to start from scratch; by the time they realized their mistake, Apple, Android, and ARM had already won.

Intel’s Missed Opportunity

It was their respective response to missing mobile that saved Microsoft, and doomed Intel. For the first seven years of the iPhone both companies refused to accept their failure, and tried desperately to leverage what they viewed as their unassailable advantages: Microsoft declined to put its productivity applications on iOS or Android, trying to get customers to adopt Windows Mobile, while Intel tried to bring its manufacturing prowess to bear to build processors that were sufficiently efficient while still being x86 compatible.

It was in 2014 that their paths diverged: Microsoft named Satya Nadella its new CEO, and his first public decision was to launch Office on iPad. This was a declaration of purpose: Microsoft would no longer be defined by Windows, and would instead focus on Azure and the cloud; no, that didn’t have the software lock-in of Windows — particularly since a key Azure decision was shifting from Windows servers to Linux — but it was a business that met Microsoft’s customers where they were, and gave the company a route to participating in the massive business opportunities enabled by mobile (given that most apps are in fact cloud services), and eventually, AI.

The equivalent choice for Intel would have been to start manufacturing ARM chips for 3rd parties, i.e. becoming a foundry instead of an integrated device manufacturer (IDM); I wrote that they should do exactly that in 2013:

It is manufacturing capability, on the other hand, that is increasingly rare, and thus, increasingly valuable. In fact, today there are only four major foundries: Samsung, GlobalFoundries, Taiwan Semiconductor Manufacturing Company, and Intel. Only four companies have the capacity to build the chips that are in every mobile device today, and in everything tomorrow.

Massive demand, limited suppliers, huge barriers to entry. It’s a good time to be a manufacturing company. It is, potentially, a good time to be Intel. After all, of those four companies, the most advanced, by a significant margin, is Intel. The only problem is that Intel sees themselves as a design company, come hell or high water.

Making chips for other companies would have required an overhaul of Intel’s culture and processes for the sake of what was then a significantly lower margin opportunity; Intel wasn’t interested, and proceeded to make a ton of money building server chips for the cloud.

In fact, though, the company was already fatally wounded. Mobile meant volume, and as the cost of new processes skyrocketed, the need for volume to leverage those costs skyrocketed as well. It was TSMC that met the moment, with Apple’s assistance: the iPhone maker would buy out the first year of every new process advancement, giving TSMC the confidence to invest, and eventually surpass Intel. That, in turn, benefited AMD, Intel’s long-time rival which now fabbed its chips at TSMC, which not only had better processor designs but, for the first time, had a better process, leading to huge gains in the data center. All of that low-level work on ARM, meanwhile, helped make ARM in PCS and in the datacenter viable, putting further pressure on Intel’s core markets.

AI was the final blow: not only did Intel not have a competitive product, it also did not have a foundry through which it could have benefitted from the exploding demand for AI chips; making matters worse is the fact that data center spending on GPUs is coming at the expense of traditional server chips, Intel’s core market.

Intel’s Death

The fundamental flaw with Pat Gelsinger’s 2020 return to Intel and his IDM 2.0 plan is that it was a decade too late. Gelsinger’s plan was to become a foundry, with Intel as its first-best customer. The former was the way to participate in mobile and AI and gain the volume necessary to push technology forward, which Intel has always done better than anyone else (EUV was the exception to the rule that Intel invents and introduces every new advance in processor technology); the latter was the way to fund the foundry and give it guaranteed volume.

Again, this is exactly what Intel should have done a decade ago, while TSMC was still in their rear-view mirror in terms of processing technology, and when its products were still dominant in PCs and the data center. By the time Gelsinger came on board, though, it was already too late: Intel’s process was behind, its product market share was threatened on all of the fronts noted above, and high-performance ARM processors had been built by TSMC for years (which meant a big advantage in terms of pre-existing IP, design software, etc.). Intel brought nothing to the table as a foundry other than being a potential second source to TSMC, which, to make matters worse, has dramatically increased its investment in leading edge nodes to absorb that skyrocketing demand. Intel’s products, meanwhile, are either non-competitive (because they are made by Intel) or not-very-profitable (because they are made by TSMC), which means that Intel is simply running out of cash.

Given this, you can make the case that Gelsinger was never the right person for the job; shortly after he took over I wrote in Intel Problems that the company needed to be split up, but he told me in a 2022 Stratechery Interview that he — and the board — weren’t interested in that:

So last week, AMD briefly passed Intel in market value, and I think Nvidia did a while ago, and neither of these companies build their own chips. It’s kind of like an inverse of the Jerry Sanders quote about “Real men have fabs!” When you were contemplating your strategy for Intel as you came back, how much consideration was there about going the same path, becoming a fabless company and leaning into your design?

PG: Let me give maybe three different answers to that question, and these become more intellectual as we go along. The first one was I wrote a strategy document for the board of directors and I said if you want to split the company in two, then you should hire a PE kind of guy to go do that, not me. My strategy is what’s become IDM 2.0 and I described it. So if you’re hiring me, that’s the strategy and 100% of the board asked me to be the CEO and supported the strategy I laid out, of which this is one of the pieces. So the first thing was all of that discussion happened before I took the job as the CEO, so there was no debate, no contemplation, et cetera, this is it.

Fast forward to last week, and the Intel board — which is a long-running disaster — is no longer on board, firing Gelsinger in the process. And, to be honest, I noted a couple of months ago that Gelsinger’s plan probably wasn’t going to work without a split and a massive cash infusion from the U.S. government, far in excess of the CHIPS Act.

That, though, doesn’t let the board off the hook: not only are they abandoning a plan they supported, their ideas for moving Intel forward are fundamentally wrong. Chairman Frank Yeary, who has inexplicable been promoted despite being present for the entirety of the Intel disaster, said in Intel’s press release about Gelsinger’s departure:

While we have made significant progress in regaining manufacturing competitiveness and building the capabilities to be a world-class foundry, we know that we have much more work to do at the company and are committed to restoring investor confidence. As a board, we know first and foremost that we must put our product group at the center of all we do. Our customers demand this from us, and we will deliver for them. With MJ’s permanent elevation to CEO of Intel Products along with her interim co-CEO role of Intel, we are ensuring the product group will have the resources needed to deliver for our customers. Ultimately, returning to process leadership is central to product leadership, and we will remain focused on that mission while driving greater efficiency and improved profitability.

Intel’s products are irrelevant to the future; that’s the fundamental foundry problem. If x86 still mattered, then Intel would be making enough money to fund its foundry efforts. Moreover, prospective Intel customers are wary that Intel — as it always has — will favor itself at the expense of its customers; the board is saying that is exactly what they want to do.

In fact, it is Intel’s manufacturing that must be saved. This is a business that yes, needs billions upon billions of dollars in funding, but it not only has a market as a TSMC competitor, but also the potential to lead that market in the long run. Moreover, Intel foundry existing is critical to national security: currently the U.S. is completely dependent on TSMC and Taiwan and all of the geopolitical risk that entails. That means it will fall on the U.S. government to figure out a solution.

Saving Intel

Last month, in A Chance to Build, I explained how tech has modularized itself over the decades, with hardware — including semiconductor fabrication — largely being outsourced to Asia, while software is developed in the U.S. The economic forces undergirding this modularization, including the path dependency from the past sixty years, will be difficult to overcome, even with tariffs.

Apple could not only not manufacture an iPhone in the U.S. because of cost, it also can’t do so because of capability; that capability is downstream of an ecosystem that has developed in Asia and a long learning curve that China has traveled and that the U.S. has abandoned. Ultimately, though, the benefit to Apple has been profound: the company has the best supply chain in the world, centered in China, that gives it the capability to build computers on an unimaginable scale with maximum quality for not that much money at all. This benefit has extended to every tech company, whether they make their own hardware or not. Software has to run on something, whether that be servers or computers or phones; hardware is software’s most essential complement.

The inverse may be the key to American manufacturing: software as hardware’s grantor of viability through integration. This is what Tesla did: the company is deeply integrated from software down through components, and builds vehicles in California (of course it has an even greater advantage with its China factory).

This is also what made Intel profitable for so long: the company’s lock-in was predicated on software, which allowed for massive profit margins that funded all of that innovation and leading edge processes in America, even as every other part of the hardware value chain went abroad. And, by extension, the reason why a product focus is a dead end for the company is because nothing is preserving x86 other than the status quo.

It follows, then, that if the U.S. wants to make Intel viable, it ideally will not just give out money, but also a point of integration. To that end, consider this report from Reuters:

A U.S. congressional commission on Tuesday proposed a Manhattan Project-style initiative to fund the development of AI systems that will be as smart or smarter than humans, amid intensifying competition with China over advanced technologies. The bipartisan U.S.-China Economic and Security Review Commission stressed that public-private partnerships are key in advancing artificial general intelligence, but did not give any specific investment strategies as it released its annual report.

To quote the report’s recommendation directly:

The Commission recommends:

Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability. AGI is generally defined as systems that are as good as or better than human capabilities across all cognitive domains and would surpass the sharpest human minds at every task. Among the specific actions the Commission
recommends for Congress:

Provide broad multiyear contracting authority to the executive branch and associated funding for leading artificial intelligence, cloud, and data center companies and others to advance the stated policy at a pace and scale consistent with the goal of U.S. AGI leadership; and

Direct the U.S. secretary of defense to provide a Defense Priorities and Allocations System “DX Rating” to items in the artificial intelligence ecosystem to ensure this project receives national priority.

The problem with this proposal is that spending the money via “public-private partnerships” will simply lock-in the current paradigm; I explained in A Chance to Build:

Software runs on hardware, and here Asia dominates. Consider AI:

Chip design, a zero marginal cost activity, is done by Nvidia, a Silicon Valley company.

Chip manufacturing, a minimal marginal cost activity that requires massive amounts of tacit knowledge gained through experience, is done by TSMC, a Taiwanese company.

An AI system contains multiple components beyond the chip, many if not most of which are manufactured in China, or other countries in Asia.

Final assembly generally happens outside of China due to U.S. export controls; Foxconn, for example, assembles many of its systems in Mexico.

AI is deployed mostly by U.S. companies, and the vast majority of application development is done by tech companies and startups, primarily in Silicon Valley.

The fact that the U.S. is the bread in the AI sandwich is no accident: those are the parts of the value chain where marginal cost is non-existent and where the software talent has the highest leverage. Similarly, it’s no accident that the highest value add in terms of hardware happens in Asia, where expertise has been developing for fifty years. The easiest — and by extension, most low-value — aspect is assembly, which can happen anywhere labor is cheap.

Given this, if the U.S. is serious about AGI, then the true Manhattan Project — doing something that will be very expensive and not necessarily economically rational — is filling in the middle of the sandwich. Saving Intel, in other words.

Start with the fact that we know that leading AI model companies are interested in dedicated chips; OpenAI is reportedly working on its own chip with Broadcom, after flirting with the idea of building its own fabs. The latter isn’t viable for a software company in a world where TSMC exists, but it is for the U.S. government if it’s serious about domestic capabilities continuing to exist. The same story applies to Google, Amazon, Microsoft, and Meta.

To that end, the U.S. government could fund an independent Intel foundry — spin out the product group along with the clueless board to Broadcom or Qualcomm or private equity — and provide price support for model builders to design and buy their chips there. Or, if the U.S. government wanted to build the whole sandwich, it could directly fund model builders — including one developed in-house — and dictate that they not just use but deeply integrate with Intel-fabricated integrated chips (it’s not out of the question that a fully integrated stack might actually be the optimal route to AGI).

It would, to be sure, be a challenge to keep such an effort out of the clutches of the federal bureaucracy and the dysfunction that has befallen the U.S. defense industry. It would be essential to give this effort the level of independence and freedom that the original Manhattan Project had, with compensation packages to match; perhaps this would be a better use of Elon Musk’s time — himself another model builder — than DOGE?

This could certainly be bearish for Nvidia, at least in the long run. Nvidia is a top priority for TSMC, and almost certainly has no interest in going anywhere else; that’s also why it would be self-defeating for a U.S. “Manhattan Project” to simply fund the status quo, which is Nvidia chips manufactured in Taiwan. Competition is ok, though; the point isn’t to kill TSMC, but to stand up a truly domestic alternative (i.e. not just a fraction of non-leading edge capacity in Arizona). Nvidia for its part deserves all of the success it is enjoying, but government-funded alternatives would ultimately manifest for consumers and businesses as lower prices for intelligence.

This is all pretty fuzzy, to be clear. What does exist, however, is a need — domestically sourced and controlled AI, which must include chips — and a company, in Intel, that is best placed to meet that need, even as it needs a rescue. Intel lost its reason to exist, even as the U.S. needs it to exist more than ever; AI is the potential integration point to solve both problems at the same time.

=====================

DeepSeek FAQMonday, January 27, 2025Listen to PodcastListen to this post:Log in to listenIt’s Monday, January 27. Why haven’t you written about DeepSeek yet?I did! I wrote about R1 last Tuesday.I totally forgot about that.I take responsibility. I stand by the post, including the two biggest takeaways that I highlighted (emergent chain-of-thought via pure reinforcement learning, and the power of distillation), and I mentioned the low cost (which I expanded on in Sharp Tech) and chip ban implications, but those observations were too localized to the current state of the art in AI. What I totally failed to anticipate were the broader implications this news would have to the overall meta-discussion, particularly in terms of the U.S. and China.Is there precedent for such a miss?There is. In September 2023 Huawei announced the Mate 60 Pro with a SMIC-manufactured 7nm chip. The existence of this chip wasn’t a surprise for those paying close attention: SMIC had made a 7nm chip a year earlier (the existence of which I had noted even earlier than that), and TSMC had shipped 7nm chips in volume using nothing but DUV lithography (later iterations of 7nm were the first to use EUV). Intel had also made 10nm (TSMC 7nm equivalent) chips years earlier using nothing but DUV, but couldn’t do so with profitable yields; the idea that SMIC could ship 7nm chips using their existing equipment, particularly if they didn’t care about yields, wasn’t remotely surprising — to me, anyways.What I totally failed to anticipate was the overwrought reaction in Washington D.C. The dramatic expansion in the chip ban that culminated in the Biden administration transforming chip sales to a permission-based structure was downstream from people not understanding the intricacies of chip production, and being totally blindsided by the Huawei Mate 60 Pro. I get the sense that something similar has happened over the last 72 hours: the details of what DeepSeek has accomplished — and what they have not — are less important than the reaction and what that reaction says about people’s pre-existing assumptions.So what did DeepSeek announce?The most proximate announcement to this weekend’s meltdown was R1, a reasoning model that is similar to OpenAI’s o1. However, many of the revelations that contributed to the meltdown — including DeepSeek’s training costs — actually accompanied the V3 announcement over Christmas. Moreover, many of the breakthroughs that undergirded V3 were actually revealed with the release of the V2 model last January.Is this model naming convention the greatest crime that OpenAI has committed?Second greatest; we’ll get to the greatest momentarily.Let’s work backwards: what was the V2 model, and why was it important?The DeepSeek-V2 model introduced two important breakthroughs: DeepSeekMoE and DeepSeekMLA. The “MoE” in DeepSeekMoE refers to “mixture of experts”. Some models, like GPT-3.5, activate the entire model during both training and inference; it turns out, however, that not every part of the model is necessary for the topic at hand. MoE splits the model into multiple “experts” and only activates the ones that are necessary; GPT-4 was a MoE model that was believed to have 16 experts with approximately 110 billion parameters each.DeepSeekMoE, as implemented in V2, introduced important innovations on this concept, including differentiating between more finely-grained specialized experts, and shared experts with more generalized capabilities. Critically, DeepSeekMoE also introduced new approaches to load-balancing and routing during training; traditionally MoE increased communications overhead in training in exchange for efficient inference, but DeepSeek’s approach made training more efficient as well.DeepSeekMLA was an even bigger breakthrough. One of the biggest limitations on inference is the sheer amount of memory required: you both need to load the model into memory and also load the entire context window. Context windows are particularly expensive in terms of memory, as every token requires both a key and corresponding value; DeepSeekMLA, or multi-head latent attention, makes it possible to compress the key-value store, dramatically decreasing memory usage during inference.I’m not sure I understood any of that.The key implications of these breakthroughs — and the part you need to understand — only became apparent with V3, which added a new approach to load balancing (further reducing communications overhead) and multi-token prediction in training (further densifying each training step, again reducing overhead): V3 was shockingly cheap to train. DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a mere $5.576 million.That seems impossibly low.DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:Lastly, we emphasize again the economical training costs of DeepSeek-V3, summarized in Table 1, achieved through our optimized co-design of algorithms, frameworks, and hardware. During the pre-training stage, training DeepSeek-V3 on each trillion tokens requires only 180K H800 GPU hours, i.e., 3.7 days on our cluster with 2048 H800 GPUs. Consequently, our pre- training stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.So no, you can’t replicate DeepSeek the company for $5.576 million.I still don’t believe that number.Actually, the burden of proof is on the doubters, at least once you understand the V3 architecture. Remember that bit about DeepSeekMoE: V3 has 671 billion parameters, but only 37 billion parameters in the active expert are computed per token; this equates to 333.3 billion FLOPs of compute per token. Here I should mention another DeepSeek innovation: while parameters were stored with BF16 or FP32 precision, they were reduced to FP8 precision for calculations; 2048 H800 GPUs have a capacity of 3.97 exoflops, i.e. 3.97 billion billion FLOPS. The training set, meanwhile, consisted of 14.8 trillion tokens; once you do all of the math it becomes apparent that 2.8 million H800 hours is sufficient for training V3. Again, this was just the final run, not the total cost, but it’s a plausible number.Scale AI CEO Alexandr Wang said they have 50,000 H100s.I don’t know where Wang got his information; I’m guessing he’s referring to this November 2024 tweet from Dylan Patel, which says that DeepSeek had “over 50k Hopper GPUs”. H800s, however, are Hopper GPUs, they just have much more constrained memory bandwidth than H100s because of U.S. sanctions.Here’s the thing: a huge number of the innovations I explained above are about overcoming the lack of memory bandwidth implied in using H800s instead of H100s. Moreover, if you actually did the math on the previous question, you would realize that DeepSeek actually had an excess of computing; that’s because DeepSeek actually programmed 20 of the 132 processing units on each H800 specifically to manage cross-chip communications. This is actually impossible to do in CUDA. DeepSeek engineers had to drop down to PTX, a low-level instruction set for Nvidia GPUs that is basically like assembly language. This is an insane level of optimization that only makes sense if you are using H800s.Meanwhile, DeepSeek also makes their models available for inference: that requires a whole bunch of GPUs above-and-beyond whatever was used for training.So was this a violation of the chip ban?Nope. H100s were prohibited by the chip ban, but not H800s. Everyone assumed that training leading edge models required more interchip memory bandwidth, but that is exactly what DeepSeek optimized both their model structure and infrastructure around.Again, just to emphasize this point, all of the decisions DeepSeek made in the design of this model only make sense if you are constrained to the H800; if DeepSeek had access to H100s, they probably would have used a larger training cluster with much fewer optimizations specifically focused on overcoming the lack of bandwidth.So V3 is a leading edge model?It’s definitely competitive with OpenAI’s 4o and Anthropic’s Sonnet-3.5, and appears to be better than Llama’s biggest model. What does seem likely is that DeepSeek was able to distill those models to give V3 high quality tokens to train on.What is distillation?Distillation is a means of extracting understanding from another model; you can send inputs to the teacher model and record the outputs, and use that to train the student model. This is how you get models like GPT-4 Turbo from GPT-4. Distillation is easier for a company to do on its own models, because they have full access, but you can still do distillation in a somewhat more unwieldy way via API, or even, if you get creative, via chat clients.Distillation obviously violates the terms of service of various models, but the only way to stop it is to actually cut off access, via IP banning, rate limiting, etc. It’s assumed to be widespread in terms of model training, and is why there are an ever-increasing number of models converging on GPT-4o quality. This doesn’t mean that we know for a fact that DeepSeek distilled 4o or Claude, but frankly, it would be odd if they didn’t.Distillation seems terrible for leading edge models.It is! On the positive side, OpenAI and Anthropic and Google are almost certainly using distillation to optimize the models they use for inference for their consumer-facing apps; on the negative side, they are effectively bearing the entire cost of training the leading edge, while everyone else is free-riding on their investment.Indeed, this is probably the core economic factor undergirding the slow divorce of Microsoft and OpenAI. Microsoft is interested in providing inference to its customers, but much less enthused about funding $100 billion data centers to train leading edge models that are likely to be commoditized long before that $100 billion is depreciated.Is this why all of the Big Tech stock prices are down?In the long run, model commoditization and cheaper inference — which DeepSeek has also demonstrated — is great for Big Tech. A world where Microsoft gets to provide inference to its customers for a fraction of the cost means that Microsoft has to spend less on data centers and GPUs, or, just as likely, sees dramatically higher usage given that inference is so much cheaper. Another big winner is Amazon: AWS has by-and-large failed to make their own quality model, but that doesn’t matter if there are very high quality open source models that they can serve at far lower costs than expected.Apple is also a big winner. Dramatically decreased memory requirements for inference make edge inference much more viable, and Apple has the best hardware for exactly that. Apple Silicon uses unified memory, which means that the CPU, GPU, and NPU (neural processing unit) have access to a shared pool of memory; this means that Apple’s high-end hardware actually has the best consumer chip for inference (Nvidia gaming GPUs max out at 32GB of VRAM, while Apple’s chips go up to 192 GB of RAM).Meta, meanwhile, is the biggest winner of all. I already laid out last fall how every aspect of Meta’s business benefits from AI; a big barrier to realizing that vision is the cost of inference, which means that dramatically cheaper inference — and dramatically cheaper training, given the need for Meta to stay on the cutting edge — makes that vision much more achievable.Google, meanwhile, is probably in worse shape: a world of decreased hardware requirements lessens the relative advantage they have from TPUs. More importantly, a world of zero-cost inference increases the viability and likelihood of products that displace search; granted, Google gets lower costs as well, but any change from the status quo is probably a net negative.I asked why the stock prices are down; you just painted a positive picture!My picture is of the long run; today is the short run, and it seems likely the market is working through the shock of R1’s existence.Wait, you haven’t even talked about R1 yet.R1 is a reasoning model like OpenAI’s o1. It has the ability to think through a problem, producing much higher quality results, particularly in areas like coding, math, and logic (but I repeat myself).Is this more impressive than V3?Actually, the reason why I spent so much time on V3 is that that was the model that actually demonstrated a lot of the dynamics that seem to be generating so much surprise and controversy. R1 is notable, however, because o1 stood alone as the only reasoning model on the market, and the clearest sign that OpenAI was the market leader.R1 undoes the o1 mythology in a couple of important ways. First, there is the fact that it exists. OpenAI does not have some sort of special sauce that can’t be replicated. Second, R1 — like all of DeepSeek’s models — has open weights (the problem with saying “open source” is that we don’t have the data that went into creating it). This means that instead of paying OpenAI to get reasoning, you can run R1 on the server of your choice, or even locally, at dramatically lower cost.How did DeepSeek make R1?DeepSeek actually made two models: R1 and R1-Zero. I actually think that R1-Zero is the bigger deal; as I noted above, it was my biggest focus in last Tuesday’s Update:R1-Zero, though, is the bigger deal in my mind. From the paper:In this paper, we take the first step toward improving language model reasoning capabilities using pure reinforcement learning (RL). Our goal is to explore the potential of LLMs to develop reasoning capabilities without any supervised data, focusing on their self-evolution through a pure RL process. Specifically, we use DeepSeek-V3-Base as the base model and employ GRPO as the RL framework to improve model performance in reasoning. During training, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. After thousands of RL steps, DeepSeek-R1-Zero exhibits super performance on reasoning benchmarks. For instance, the pass@1 score on AIME 2024 increases from 15.6% to 71.0%, and with majority voting, the score further improves to 86.7%, matching the performance of OpenAI-o1-0912.Reinforcement learning is a technique where a machine learning model is given a bunch of data and a reward function. The classic example is AlphaGo, where DeepMind gave the model the rules of Go with the reward function of winning the game, and then let the model figure everything else on its own. This famously ended up working better than other more human-guided techniques.LLMs to date, however, have relied on reinforcement learning with human feedback; humans are in the loop to help guide the model, navigate difficult choices where rewards aren’t obvious, etc. RLHF was the key innovation in transforming GPT-3 into ChatGPT, with well-formed paragraphs, answers that were concise and didn’t trail off into gibberish, etc.R1-Zero, however, drops the HF part — it’s just reinforcement learning. DeepSeek gave the model a set of math, code, and logic questions, and set two reward functions: one for the right answer, and one for the right format that utilized a thinking process. Moreover, the technique was a simple one: instead of trying to evaluate step-by-step (process supervision), or doing a search of all possible answers (a la AlphaGo), DeepSeek encouraged the model to try several different answers at a time and then graded them according to the two reward functions.What emerged is a model that developed reasoning and chains-of-thought on its own, including what DeepSeek called “Aha Moments”:A particularly intriguing phenomenon observed during the training of DeepSeek-R1-Zero is the occurrence of an “aha moment”. This moment, as illustrated in Table 3, occurs in an intermediate version of the model. During this phase, DeepSeek-R1-Zero learns to allocate more thinking time to a problem by reevaluating its initial approach. This behavior is not only a testament to the model’s growing reasoning abilities but also a captivating example of how reinforcement learning can lead to unexpected and sophisticated outcomes.This moment is not only an “aha moment” for the model but also for the researchers observing its behavior. It underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies. The “aha moment” serves as a powerful reminder of the potential of RL to unlock new levels of intelligence in artificial systems, paving the way for more autonomous and adaptive models in the future.This is one of the most powerful affirmations yet of The Bitter Lesson: you don’t need to teach the AI how to reason, you can just give it enough compute and data and it will teach itself!Well, almost: R1-Zero reasons, but in a way that humans have trouble understanding. Back to the introduction:However, DeepSeek-R1-Zero encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates a small amount of cold-start data and a multi-stage training pipeline. Specifically, we begin by collecting thousands of cold-start data to fine-tune the DeepSeek-V3-Base model. Following this, we perform reasoning-oriented RL like DeepSeek-R1-Zero. Upon nearing convergence in the RL process, we create new SFT data through rejection sampling on the RL checkpoint, combined with supervised data from DeepSeek-V3 in domains such as writing, factual QA, and self-cognition, and then retrain the DeepSeek-V3-Base model. After fine-tuning with the new data, the checkpoint undergoes an additional RL process, taking into account prompts from all scenarios. After these steps, we obtained a checkpoint referred to as DeepSeek-R1, which achieves performance on par with OpenAI-o1-1217.This sounds a lot like what OpenAI did for o1: DeepSeek started the model out with a bunch of examples of chain-of-thought thinking so it could learn the proper format for human consumption, and then did the reinforcement learning to enhance its reasoning, along with a number of editing and refinement steps; the output is a model that appears to be very competitive with o1.Here again it seems plausible that DeepSeek benefited from distillation, particularly in terms of training R1. That, though, is itself an important takeaway: we have a situation where AI models are teaching AI models, and where AI models are teaching themselves. We are watching the assembly of an AI takeoff scenario in realtime.So are we close to AGI?It definitely seems like it. This also explains why Softbank (and whatever investors Masayoshi Son brings together) would provide the funding for OpenAI that Microsoft will not: the belief that we are reaching a takeoff point where there will in fact be real returns towards being first.But isn’t R1 now in the lead?I don’t think so; this has been overstated. R1 is competitive with o1, although there do seem to be some holes in its capability that point towards some amount of distillation from o1-Pro. OpenAI, meanwhile, has demonstrated o3, a far more powerful reasoning model. DeepSeek is absolutely the leader in efficiency, but that is different than being the leader overall.So why is everyone freaking out?I think there are multiple factors. First, there is the shock that China has caught up to the leading U.S. labs, despite the widespread assumption that China isn’t as good at software as the U.S.. This is probably the biggest thing I missed in my surprise over the reaction. The reality is that China has an extremely proficient software industry generally, and a very good track record in AI model building specifically.Second is the low training cost for V3, and DeepSeek’s low inference costs. This part was a big surprise for me as well, to be sure, but the numbers are plausible. This, by extension, probably has everyone nervous about Nvidia, which obviously has a big impact on the market.Third is the fact that DeepSeek pulled this off despite the chip ban. Again, though, while there are big loopholes in the chip ban, it seems likely to me that DeepSeek accomplished this with legal chips.I own Nvidia! Am I screwed?There are real challenges this news presents to the Nvidia story. Nvidia has two big moats:

CUDA is the language of choice for anyone programming these models, and CUDA only works on Nvidia chips.
Nvidia has a massive lead in terms of its ability to combine multiple chips together into one large virtual GPU.

These two moats work together. I noted above that if DeepSeek had access to H100s they probably would have used a larger cluster to train their model, simply because that would have been the easier option; the fact they didn’t, and were bandwidth constrained, drove a lot of their decisions in terms of both model architecture and their training infrastructure. Just look at the U.S. labs: they haven’t spent much time on optimization because Nvidia has been aggressively shipping ever more capable systems that accommodate their needs. The route of least resistance has simply been to pay Nvidia. DeepSeek, however, just demonstrated that another route is available: heavy optimization can produce remarkable results on weaker hardware and with lower memory bandwidth; simply paying Nvidia more isn’t the only way to make better models.That noted, there are three factors still in Nvidia’s favor. First, how capable might DeepSeek’s approach be if applied to H100s, or upcoming GB100s? Just because they found a more efficient way to use compute doesn’t mean that more compute wouldn’t be useful. Second, lower inference costs should, in the long run, drive greater usage. Microsoft CEO Satya Nadella, in a late night tweet almost assuredly directed at the market, said exactly that:Jevons paradox strikes again! As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can’t get enough of. https://t.co/omEcOPhdIz— Satya Nadella (@satyanadella) January 27, 2025Third, reasoning models like R1 and o1 derive their superior performance from using more compute. To the extent that increasing the power and capabilities of AI depend on more compute is the extent that Nvidia stands to benefit!Still, it’s not all rosy. At a minimum DeepSeek’s efficiency and broad availability cast significant doubt on the most optimistic Nvidia growth story, at least in the near term. The payoffs from both model and infrastructure optimization also suggest there are significant gains to be had from exploring alternative approaches to inference in particular. For example, it might be much more plausible to run inference on a standalone AMD GPU, completely sidestepping AMD’s inferior chip-to-chip communications capability. Reasoning models also increase the payoff for inference-only chips that are even more specialized than Nvidia’s GPUs.In short, Nvidia isn’t going anywhere; the Nvidia stock, however, is suddenly facing a lot more uncertainty that hasn’t been priced in. And that, by extension, is going to drag everyone down.So what about the chip ban?The easiest argument to make is that the importance of the chip ban has only been accentuated given the U.S.’s rapidly evaporating lead in software. Software and knowhow can’t be embargoed — we’ve had these debates and realizations before — but chips are physical objects and the U.S. is justified in keeping them away from China.At the same time, there should be some humility about the fact that earlier iterations of the chip ban seem to have directly led to DeepSeek’s innovations. Those innovations, moreover, would extend to not just smuggled Nvidia chips or nerfed ones like the H800, but to Huawei’s Ascend chips as well. Indeed, you can very much make the case that the primary outcome of the chip ban is today’s crash in Nvidia’s stock price.What concerns me is the mindset undergirding something like the chip ban: instead of competing through innovation in the future the U.S. is competing through the denial of innovation in the past. Yes, this may help in the short term — again, DeepSeek would be even more effective with more computing — but in the long run it simply sews the seeds for competition in an industry — chips and semiconductor equipment — over which the U.S. has a dominant position.Like AI models?AI models are a great example. I mentioned above I would get to OpenAI’s greatest crime, which I consider to be the 2023 Biden Executive Order on AI. I wrote in Attenuating Innovation:The point is this: if you accept the premise that regulation locks in incumbents, then it sure is notable that the early AI winners seem the most invested in generating alarm in Washington, D.C. about AI. This despite the fact that their concern is apparently not sufficiently high to, you know, stop their work. No, they are the responsible ones, the ones who care enough to call for regulation; all the better if concerns about imagined harms kneecap inevitable competitors.That paragraph was about OpenAI specifically, and the broader San Francisco AI community generally. For years now we have been subject to hand-wringing about the dangers of AI by the exact same people committed to building it — and controlling it. These alleged dangers were the impetus for OpenAI becoming closed back in 2019 with the release of GPT-2:Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code⁠(opens in a new window). We are not releasing the dataset, training code, or GPT-2 model weights…We are aware that some researchers have the technical capacity to reproduce and open source our results. We believe our release strategy limits the initial set of organizations who may choose to do this, and gives the AI community more time to have a discussion about the implications of such systems.We also think governments should consider expanding or commencing initiatives to more systematically monitor the societal impact and diffusion of AI technologies, and to measure the progression in the capabilities of such systems. If pursued, these efforts could yield a better evidence base for decisions by AI labs and governments regarding publication decisions and AI policy more broadly.The arrogance in this statement is only surpassed by the futility: here we are six years later, and the entire world has access to the weights of a dramatically superior model. OpenAI’s gambit for control — enforced by the U.S. government — has utterly failed. In the meantime, how much innovation has been foregone by virtue of leading edge models not having open weights? More generally, how much time and energy has been spent lobbying for a government-enforced moat that DeepSeek just obliterated, that would have been better devoted to actual innovation?So you’re not worried about AI doom scenarios?I definitely understand the concern, and just noted above that we are reaching the stage where AIs are training AIs and learning reasoning on their own. I recognize, though, that there is no stopping this train. More than that, this is exactly why openness is so important: we need more AIs in the world, not an unaccountable board ruling all of us.Wait, why is China open-sourcing their model?Well DeepSeek is, to be clear; CEO Liang Wenfeng said in a must-read interview that open source is key to attracting talent:In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team — our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat.Open source, publishing papers, in fact, do not cost us anything. For technical talent, having others follow your innovation gives a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one, and contributing to it earns us respect. There is also a cultural attraction for a company to do this.The interviewer asked if this would change:DeepSeek, right now, has a kind of idealistic aura reminiscent of the early days of OpenAI, and it’s open source. Will you change to closed source later on? Both OpenAI and Mistral moved from open-source to closed-source.We will not change to closed source. We believe having a strong technical ecosystem first is more important.This actually makes sense beyond idealism. If models are commodities — and they are certainly looking that way — then long-term differentiation comes from having a superior cost structure; that is exactly what DeepSeek has delivered, which itself is resonant of how China has come to dominate other industries. This is also contrary to how most U.S. companies think about differentiation, which is through having differentiated products that can sustain larger margins.So is OpenAI screwed?Not necessarily. ChatGPT made OpenAI the accidental consumer tech company, which is to say a product company; there is a route to building a sustainable consumer business on commoditizable models through some combination of subscriptions and advertisements. And, of course, there is the bet on winning the race to AI take-off.Anthropic, on the other hand, is probably the biggest loser of the weekend. DeepSeek made it to number one in the App Store, simply highlighting how Claude, in contrast, hasn’t gotten any traction outside of San Francisco. The API business is doing better, but API businesses in general are the most susceptible to the commoditization trends that seem inevitable (and do note that OpenAI and Anthropic’s inference costs look a lot higher than DeepSeek because they were capturing a lot of margin; that’s going away).So this is all pretty depressing, then?Actually, no. I think that DeepSeek has provided a massive gift to nearly everyone. The biggest winners are consumers and businesses who can anticipate a future of effectively-free AI products and services. Jevons Paradox will rule the day in the long run, and everyone who uses AI will be the biggest winners.Another set of winners are the big consumer tech companies. A world of free AI is a world where product and distribution matters most, and those companies already won that game; The End of the Beginning was right.China is also a big winner, in ways that I suspect will only become apparent over time. Not only does the country have access to DeepSeek, but I suspect that DeepSeek’s relative success to America’s leading AI labs will result in a further unleashing of Chinese innovation as they realize they can compete.That leaves America, and a choice we have to make. We could, for very logical reasons, double down on defensive measures, like massively expanding the chip ban and imposing a permission-based regulatory regime on chips and semiconductor equipment that mirrors the E.U.’s approach to tech; alternatively, we could realize that we have real competition, and actually give ourself permission to compete. Stop wringing our hands, stop campaigning for regulations — indeed, go the other way, and cut out all of the cruft in our companies that has nothing to do with winning. If we choose to compete we can still win, and, if we do, we will have a Chinese company to thank.I wrote a follow-up to this Article in this Daily Update.

Stratechery Plus + AsianometryMonday, January 20, 2025Listen to PodcastListen to this post:Log in to listenBack in 2022, I rebranded a Stratechery subscription as Stratechery Plus, a bundle of content that would enhance the value of your subscription; today the bundle includes:
- The Stratechery Update
- The Stratechery Interview series
- The Sharp Tech with Ben Thompson podcast, hosted by Andrew Sharp
- The Sharp China with Bill Bishop podcast, hosted by Andrew Sharp
- The Dithering podcast, with John Gruber and myself
- The Greatest of All Talk podcast, with Ben Golliver and Andrew Sharp
Today I am excited to announce a new addition: the Asianometry newsletter and podcast, by Jon Yu.Asianometry is one of the best tech YouTube channels in existence, with over 768,000 subscribers. Jon produces in-depth videos explaining every aspect of technology, with a particular expertise in semiconductors. To give you an idea of Jon’s depth, he has made 31 videos about TSMC alone. His semiconductor course includes 30 videos covering everything from designing chips to how ASML builds EUV machines to Moore’s Law. His video on the end of Dennard’s Law is a particular standout:https://www.youtube.com/embed/7p8ZeSbblec?list=PLKtxx9TnH76RKmJlFIfs7NumNp1RPNIQvJon is about more than semiconductors though: he’s made videos about other tech topics like The Tragedy of Compaq, and non-tech topics like Japanese Whisky and Taiwan convenience stores. In short, Jon is an intensely curious person who does his research, and we are blessed that he puts in the work to share what he learns.I am blessed most of all, however. Over the last year Jon has been making Stratechery Articles into video essays and cutting clips for Sharp Tech; he did a great job with one of my favorite articles of 2024:https://www.youtube.com/embed/3cgjWHzFdj4?feature=oembedAnd now, starting today, Stratechery Plus subscribers can get exclusive access to Asianometry’s content in newsletter and podcast form. The Asianometry YouTube Channel will remain free and Jon’s primary focus, but from now on all of his content will be simultaneously released as a transcript and podcast. Stratechery Plus subscribers can head over to the new Asianometry Passport site to subscribe to his emails, or to add the podcast feed to your favorite podcast player.And, of course, subscribe to Jon’s YouTube channel, along with Stratechery Plus.
AI’s Uneven ArrivalMonday, January 13, 2025Listen to Podcast Watch on YouTubeListen to this post:Log in to listenBox’s route to its IPO, ten years ago this month, was a difficult one: the company first released an S-1 in March 2014, and potential investors were aghast at the company’s mounting losses; the company took a down round and, eight months later, released an updated S-1 that created the template for money-losing SaaS businesses to explain themselves going forward:Our business model focuses on maximizing the lifetime value of a customer relationship. We make significant investments in acquiring new customers and believe that we will be able to achieve a positive return on these investments by retaining customers and expanding the size of our deployments within our customer base over time…We experience a range of profitability with our customers depending in large part upon what stage of the customer phase they are in. We generally incur higher sales and marketing expenses for new customers and existing customers who are still in an expanding stage…For typical customers who are renewing their Box subscriptions, our associated sales and marketing expenses are significantly less than the revenue we recognize from those customers.This was the justification for those top-line losses; I wrote in an Update at the time:That right there is the SaaS business model: you’re not so much selling a product as you are creating annuities with a lifetime value that far exceeds whatever you paid to acquire them. Moreover, if the model is working — and in retrospect, we know it has for that 2010 cohort — then I as an investor absolutely would want Box to spend even more on customer acquisition, which, of course, Box has done. The 2011 cohort is bigger than 2010, the 2012 cohort bigger than 2011, etc. This, though, has meant that the aggregate losses have been very large, which looks bad, but, counterintuitively, is a good thing.Numerous SaaS businesses would include some version of this cohort chart in their S-1’s, each of them manifestations of what I’ve long considered tech’s sixth giant: Apple, Amazon, Google, Meta, Microsoft, and what I call “Silicon Valley Inc.”, the pipeline of SaaS companies that styled themselves as world-changing startups but which were, in fact, color-by-numbers business model disruptions enabled by cloud computing and a dramatically expanded venture capital ecosystem that increasingly accepted relatively low returns in exchange for massively reduced risk profiles.This is not, to be clear, an Article about Box, or any one SaaS company in particular; it is, though, an exploration of how an era that opened — at least in terms of IPOs — a decade ago is both doomed in the long run and yet might have more staying power than you expect.Digital Advertising DifferencesJohn Wanamaker, a department store founder and advertising pioneer, famously said, “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” That, though, was the late 19th century; the last two decades have seen the rise of digital advertising, the defining characteristic of which is knowledge about whom is being targeted, and whether or not they converted. The specifics of how this works have shifted over time, particularly with the crackdown on cookies and Apple’s App Tracking Transparency initiative, which made digital advertising less deterministic and more probabilistic; the probabilities at play, though, are a lot closer to 100% than they are to a flip-of-a-coin.What is interesting is that this advertising approach hasn’t always worked for everything, most notably some of the most advertising-centric businesses in the world. Back in 2016 Procter & Gamble announced they were scaling back targeted Facebook ads; from the Wall Street Journal:Procter & Gamble Co., the biggest advertising spender in the world, will move away from ads on Facebook that target specific consumers, concluding that the practice has limited effectiveness. Facebook Inc. has spent years developing its ability to zero in on consumers based on demographics, shopping habits and life milestones. P&G, the maker of myriad household goods including Tide and Pampers, initially jumped at the opportunity to market directly to subsets of shoppers, from teenage shavers to first-time homeowners.Marc Pritchard, P&G’s chief marketing officer, said the company has realized it took the strategy too far. “We targeted too much, and we went too narrow,” he said in an interview, “and now we’re looking at: What is the best way to get the most reach but also the right precision?”…On a broader scale, P&G’s shift highlights the limits of such targeting for big brands, one of the cornerstones of Facebook’s ad business. The social network is able to command higher prices for its targeted marketing; the narrower the targeting the more expensive the ad.P&G is a consumer packaged goods (CPG) company, and what mattered most for CPG companies was shelf space. Consumers would become aware of a brand through advertising, motivated to buy through things like coupons, and the payoff came when they were in the store and chose one of the CPG brands off the shelf; of course CPG companies paid for that shelf space, particularly coveted end-caps that made it more likely consumers saw the brands they were familiar with through advertising. There were returns to scale, as well: manufacturing is a big one; the more advertising you bought the less paid per ad; more importantly, the more shelf space you had the more room you had to expand your product lines, and crowd out competitors.The advertising component specifically was usually outsourced to ad agencies, for reasons I explained in a 2017 Article:Few advertisers actually buy ads, at least not directly. Way back in 1841, Volney B. Palmer, the first ad agency, was opened in Philadelphia. In place of having to take out ads with multiple newspapers, an advertiser could deal directly with the ad agency, vastly simplifying the process of taking out ads. The ad agency, meanwhile, could leverage its relationships with all of those newspapers by serving multiple clients:It’s a classic example of how being in the middle can be a really great business opportunity, and the utility of ad agencies only increased as more advertising formats like radio and TV became available. Particularly in the case of TV, advertisers not only needed to place ads, but also needed a lot more help in making ads; ad agencies invested in ad-making expertise because they could scale said expertise across multiple clients.At the same time, the advertisers were rapidly expanding their geographic footprints, particularly after the Second World War; naturally, ad agencies increased their footprint at the same time, often through M&A. The overarching business opportunity, though, was the same: give advertisers a one-stop shop for all of their advertising needs.The Internet provided two big challenges to this approach. First, the primary conversion point changed from the cash register to the check-out page; the products that benefited the most were either purely digital (like apps) or — at least in the earlier days of e-commerce — spur-of-the-moment purchases without major time pressure. CPG products didn’t really fall in either bucket.Second, these types of purchases aligned well with the organizing principle of digital advertising, which is the individual consumer. What Facebook — now Meta — is better at than anyone in the world is understanding consumers not as members of a cohort or demographic group but rather as individuals, and serving them ads that are uniquely interesting to them.Notice, though, that nothing in the traditional advertiser model was concerned with the individual: brands are created for cohorts or demographic groups, because they need to be manufactured at scale; then, ad agencies would advertise at scale — making money along the way — and the purchase would be consummated in physical stores at some later point in time, constrained (and propelled by) limited shelf space. Thus P&G’s pullback — and thus the opportunity for an entirely new wave of companies that were built around digital advertising and its deep personalization from the get-go.This bifurcation manifested itself most starkly in the summer of 2020, when large advertisers boycotted Facebook over the company’s refusal to censor then-President Trump; Facebook was barely affected. I wrote in Apple and Facebook:This is a very different picture from Facebook, where as of Q1 2019 the top 100 advertisers made up less than 20% of the company’s ad revenue; most of the $69.7 billion the company brought in last year came from its long tail of 8 million advertisers…This explains why the news about large CPG companies boycotting Facebook is, from a financial perspective, simply not a big deal. Unilever’s $11.8 million in U.S. ad spend, to take one example, is replaced with the same automated efficiency that Facebook’s timeline ensures you never run out of content. Moreover, while Facebook loses some top-line revenue — in an auction-based system, less demand corresponds to lower prices — the companies that are the most likely to take advantage of those lower prices are those that would not exist without Facebook, like the direct-to-consumer companies trying to steal customers from massive conglomerates like Unilever.In this way Facebook has a degree of anti-fragility that even Google lacks: so much of its business comes from the long tail of Internet-native companies that are built around Facebook from first principles, that any disruption to traditional advertisers — like the coronavirus crisis or the current boycotts — actually serves to strengthen the Facebook ecosystem at the expense of the TV-centric ecosystem of which these CPG companies are a part.It has been nine years since that P&G pullback I referenced above, and one of the big changes that P&G has made in that timeframe is to take most of their ad-buying in-house. This was in the long run inevitable, as the Internet ate everything, including traditional TV viewing, and as the rise of Aggregation platforms meant that the number of places you needed to actually buy an ad to reach everyone decreased even as potential reach increased. Those platforms also got better: programmatic platforms achieve P&G’s goal of mass reach in a way that actually increased efficiency instead of over-spending to over-target; programmatic advertising also covers more platforms now, including TV.o3 AmmunitionLate last month OpenAI announced its o3 model, validating its initial o1 release and the returns that come from test-time scaling; I explained in an Update when o1 was released:There has been a lot of talk about the importance of scale in terms of LLM performance; for auto-regressive LLMs that has meant training scale. The more parameters you have, the larger the infrastructure you need, but the payoff is greater accuracy because the model is incorporating that much more information. That certainly still applies to o1, as the chart on the left indicates.It’s the chart on the right that is the bigger deal: o1 gets more accurate the more time it spends on compute at inference time. This makes sense intuitively given what I laid out above: the more time spent on compute the more time o1 can spend spinning up multiple chains-of-thought, checking its answers, and iterating through different approaches and solutions.It’s also a big departure from how we have thought about LLMs to date: one of the “benefits” of auto-regressive LLMs is that you’re only generating one answer in a serial manner. Yes, you can get that answer faster with beefier hardware, but that is another way of saying that the pay-off from more inference compute is getting the answer faster; the accuracy of the answer is a function of the underlying model, not the amount of compute brought to bear. Another way to think about it is that the more important question for inference is how much memory is available; the more memory there is, the larger the model, and therefore, the greater amount of accuracy.In this o1 represents a new inference paradigm: yes, you need memory to load the model, but given the same model, answer quality does improve with more compute. The way that I am thinking about it is that more compute is kind of like having more branch predictors, which mean more registers, which require more cache, etc.; this isn’t a perfect analogy, but it is interesting to think about inference compute as being a sort of dynamic memory architecture for LLMs that lets them explore latent space for the best answer.o3 significantly outperforms o1, and the extent of that outperformance is dictated by how much computing is allocated to the problem at hand. One of the most stark examples was o3‘s performance on the ARC prize, a visual puzzle test that is designed to be easy for humans but hard for LLMs:OpenAI’s new o3 system – trained on the ARC-AGI-1 Public Training set – has scored a breakthrough 75.7% on the Semi-Private Evaluation set at our stated public leaderboard $10k compute limit. A high-compute (172x) o3 configuration scored 87.5%.This is a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models. For context, ARC-AGI-1 took 4 years to go from 0% with GPT-3 in 2020 to 5% in 2024 with GPT-4o. All intuition about AI capabilities will need to get updated for o3…Despite the significant cost per task, these numbers aren’t just the result of applying brute force compute to the benchmark. OpenAI’s new o3 model represents a significant leap forward in AI’s ability to adapt to novel tasks. This is not merely incremental improvement, but a genuine breakthrough, marking a qualitative shift in AI capabilities compared to the prior limitations of LLMs. o3 is a system capable of adapting to tasks it has never encountered before, arguably approaching human-level performance in the ARC-AGI domain.Of course, such generality comes at a steep cost, and wouldn’t quite be economical yet: you could pay a human to solve ARC-AGI tasks for roughly $5 per task (we know, we did that), while consuming mere cents in energy. Meanwhile o3 requires $17-20 per task in the low-compute mode. But cost-performance will likely improve quite dramatically over the next few months and years, so you should plan for these capabilities to become competitive with human work within a fairly short timeline.I don’t believe that o3 and inference-time scaling will displace traditional LLMs, which will remain both faster and cheaper; indeed, they will likely make traditional LLMs better through their ability to generate synthetic data for further scaling of pre-training. There remains a large product overhang for traditional LLMS — the technology is far more capable than the products that have been developed to date — but even the current dominant product, chatbots, are better experienced with a traditional LLM.That very use case, however, gets at traditional LLM limitations: because they lack the ability to think and decide and verify they are best thought of as a tool for humans to leverage. Indeed, while conventional wisdom about these models is that it allows anyone to generate good enough writing and research, the biggest returns come to those with the most expertise and agency, who are able to use their own knowledge and judgment to reap efficiency gains while managing hallucinations and mistakes.What o3 and inference-time scaling point to is something different: AI’s that can actually be given tasks and trusted to complete them. This, by extension, looks a lot more like an independent worker than an assistant — ammunition, rather than a rifle sight. That may seem an odd analogy, but it comes from a talk Keith Rabois gave at Stanford:https://videopress.com/embed/Xxwv3tcK?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataSo I like this idea of barrels and ammunition. Most companies, once they get into hiring mode…just hire a lot of people, you expect that when you add more people your horsepower or your velocity of shipping things is going to increase. Turns out it doesn’t work that way. When you hire more engineers you don’t get that much more done. You actually sometimes get less done. You hire more designers, you definitely don’t get more done, you get less done in a day.The reason why is because most great people actually are ammunition. But what you need in your company are barrels. And you can only shoot through the number of unique barrels that you have. That’s how the velocity of your company improves is adding barrels. Then you stock them with ammunition, then you can do a lot. You go from one barrel company, which is mostly how you start, to a two barrel company, suddenly you get twice as many things done in a day, per week, per quarter. If you go to three barrels, great. If you go to four barrels, awesome. Barrels are very difficult to find. But when you have them, give them lots of equity. Promote them, take them to dinner every week, because they are virtually irreplaceable. They are also very culturally specific. So a barrel at one company may not be a barrel at another company because one of the ways, the definition of a barrel is, they can take an idea from conception and take it all the way to shipping and bring people with them. And that’s a very cultural skill set.The promise of AI generally, and inference-time scaling models in particular, is that they can be ammunition; in this context, the costs — even marginal ones — will in the long run be immaterial compared to the costs of people, particularly once you factor in non-salary costs like coordination and motivation.The Uneven AI ArrivalThere is a long way to go to realize this vision technically, although the arrival of first o1 and then o3 signal that the future is arriving more quickly than most people realize. OpenAI CEO Sam Altman wrote on his blog:We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies. We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes.I grant the technical optimism; my definition of AGI is that it can be ammunition, i.e. it can be given a task and trusted to complete it at a good-enough rate (my definition of Artificial Super Intelligence (ASI) is the ability to come up with the tasks in the first place). The reason for the extended digression on advertising, however, is to explain why I’m skeptical about AI “materially chang[ing] the output of companies”, at least in 2025.In this analogy CPG companies stand in for the corporate world generally. What will become clear once AI ammunition becomes available is just how unsuited most companies are for high precision agents, just as P&G was unsuited for highly-targeted advertising. No matter how well-documented a company’s processes might be, it will become clear that there are massive gaps that were filled through experience and tacit knowledge by the human ammunition.SaaS companies, meanwhile, are the ad agencies. The ad agencies had value by providing a means for advertisers to scale to all sorts of media across geographies; SaaS companies have value by giving human ammunition software to do their job. Ad agencies, meanwhile, made money by charging a commission on the advertising they bought; SaaS companies make money by charging a per-seat licensing fee. Look again at that S-1 excerpt I opened with:Our business model focuses on maximizing the lifetime value of a customer relationship. We make significant investments in acquiring new customers and believe that we will be able to achieve a positive return on these investments by retaining customers and expanding the size of our deployments within our customer base over time…The positive return on investment comes from retaining and increasing seat licenses; those seats, however, are proxies for actually getting work done, just as advertising was just a proxy for actually selling something. Part of what made direct response digital advertising fundamentally different is that it was tied to actually making a sale, as opposed to lifting brand awareness, which is a proxy for the ultimate goal of increasing revenue. To that end, AI — particularly AI’s like o3 that scale with compute — will be priced according to the value of the task they complete; the amount that companies will pay for inference time compute will be a function of how much the task is worth. This is analogous to digital ads that are priced by conversion, not CPM.The companies that actually leveraged that capability, however, were not, at least for a good long while, the companies that dominated the old advertising paradigm. Facebook became a juggernaut by creating its own customer base, not by being the advertising platform of choice for companies like P&G; meanwhile, TV and the economy built on it stayed relevant far longer than anyone expected. And, by the time TV truly collapsed, both the old guard and digital advertising had evolved to the point that they could work together.If something similar plays out with AI agents, then the most important AI customers will primarily be new companies, and probably a lot of them will be long tail type entities that take the barrel and ammunition analogy to its logical extreme. Traditional companies, meanwhile, will struggle to incorporate AI (outside of whole-scale job replacement a la the mainframe); the true AI takeover of enterprises that retain real world differentiation will likely take years.None of this is to diminish what is coming with AI; rather, as the saying goes, the future may arrive but be unevenly distributed, and, contrary to what you might think, the larger and more successful a company is the less they may benefit in the short term. Everything that makes a company work today is about harnessing people — and the entire SaaS ecosystem is predicated on monetizing this reality; the entities that will truly leverage AI, however, will not be the ones that replace them, but start without them.https://www.youtube.com/embed/rRdxfndiuLE?feature=oembed
The 2024 Stratechery Year in ReviewThursday, December 19, 2024Stratechery, incredibly enough, has been my full-time job for over a decade; this is the 12th year-in-review. Here are the previous editions:2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013It has long been a useful cliché to say that covering tech is easy, because something is always happening; now that that something is AI, that is more true than ever. Nearly every Article on Stratechery this year was about AI in some way or another, and that is likely to be true for years to come.This year Stratechery published 29 free Articles, 109 subscriber Updates, and 40 Interviews. Today, as per tradition, I summarize the most popular and most important posts of the year.The Five Most-Viewed ArticlesThe five most-viewed articles on Stratechery according to page views:
1. Intel Honesty — The best way to both save Intel and have leading edge manufacturing in the U.S. is to split the company, and for the U.S. government to pick up the bill via purchase guarantees.
2. Gemini and Google’s Culture — The Google Gemini fiasco shows that the biggest challenge for Google in AI is not business model but rather company culture; change is needed from the top down.
3. Intel’s Humbling — Intel under Pat Gelsinger is reaping the disaster that came from a lack of investment and execution a decade ago; the company, though, appears to be headed in the right direction, as evidenced by its execution and recent deal with UMC.
4. The Apple Vision Pro — The Apple Vision Pro is a disappointment for productivity, in part because of choices made to deliver a remarkable entertainment experience. Plus, the future of AR/VR for Apple and Meta.
5. MKBHDs For Everything — Marques Brownlee has tremendous power because he can go direct to consumers; that is possible in media, and AI will make it possible everywhere.
AI and the FutureLooking ahead to how AI will change everything.
- Enterprise Philosophy and The First Wave of AI — The first wave of successful AI implementations will probably look more like the first wave of computing, which was dominated by large-scale enterprise installations that eliminated jobs. Consumer will come later. YouTube
- The Gen AI Bridge to the Future — Generative AI is the bridge to the next computing paradigm of wearables, just like the Internet bridged the gap from PCs to smartphones.
- The New York Times’ AI Opportunity — The New York Times is suing OpenAI, but it is the New York Times that stands to benefit the most from large language models, thanks to its transformation to being an Internet entity. YouTube
- AI Integration and Modularization — Breaking down the Big Tech AI landscape through the lens of integration and modularization. YouTube
- Aggregator’s AI Risk — A single AI can never make everyone happy, which is fundamentally threatening to the Aggregator business model; the solution is personalized AI. YouTube
Government and RegulationAn emerging theme this year — which I expect to continue alongside AI — is the rising importance of non-economic factors in terms of technological development, even as regulators ramp up pressure on the giants of the Aggregator era.
- A Chance to Build — Silicon Valley has always been deeply integrated with Asia; Trump’s attempt to change trade could hurt Silicon Valley more than expected, and also present opportunities to build something new. YouTube
- Intel’s Death and Potential Revival — Intel died when mobile cost it its software differentiation; if the U.S. wants a domestic foundry, then it ought to leverage the need for AI chips to make an independent Intel foundry viable. YouTube
- The E.U. Goes Too Far — Recent E.U. regulatory decisions cross the line from market correction to property theft; if the E.U. continues down this path they are likely to see fewer new features and no new companies. YouTube
- Friendly Google and Enemy Remedies — The DOJ brought the right kind of case against an Aggregator, which stagnates by being too nice; the goal is for companies to act like they actually have enemies. YouTube
- United States v. Apple — Apple is being sued by the DOJ, but most of the complaints aren’t about the App Store. I think, though, Apple’s approach to the App Store is what led to this case.
Big TechThe biggest tech companies, as usual, provided the most consistent lens on how the world is changing.
- Meta’s AI Abundance — Meta is well-positioned to the biggest beneficiary of AI and the largest company in the world. YouTube
- Gemini 1.5 and Google’s Nature — Google Cloud Next 2024 was Google’s most impressive assertion yet that it has the AI scale advantage and is determined to use it. YouTube (See also: Integration and Android)
- Elon Dreams and Bitter Lessons — SpaceX’s triumph is downstream of a dream and getting the cost structure necessary to make it happen; Elon Musk is trying the same approach for Tesla self-driving cars. YouTube
- Apple Intelligence is Right On Time — Apple is expected to announce a range of AI features at WWDC; the company is well placed to benefit from AI: they are not too late, but right on time. YouTube (See also: WWDC, Apple Intelligence, Apple Aggregates AI)
- Nvidia Waves and Moats — Nvidia’s GTC was an absolute spectacle; it was also a different kind of keynote than before ChatGPT, which is related to Nvidia’s need to dig a new kind of software moat. YouTube
Other Articles this year included: The Apple Vision Pro’s Missing Apps | Sora, Groq, and Virtual Reality | Meta and Open | Meta and Reasonable Doubt | The Great Flattening | Windows Returns | Crashes and Competition | Boomer AppleStratechery InterviewsThursdays are for Stratechery Interview — in podcast and transcript form — with public company executives, founders and private company executives, and other analysts.Public Company Executive Interviews:Arm CEO Rene Haas | Netflix co-CEO Greg Peters | Zoom CEO Eric Yuan | dLocal Founder Sebastian Kanovich and CEO Pedro Arnt | Google Cloud CEO Thomas Kurian | Walmart CEO Doug McMillon | Microsoft CEO Satya Nadella and CTO Kevin Scott | AMD CEO Lisa Su | Google SVP Rick Osterloh | Zillow CEO Jeremy Wacksman | Meta CTO Andrew Bosworth | Salesforce CEO Marc Benioff | Synopsys CEO Sassine GhaziStartup/Private Company Executive Interviews:Rescale CEO Joris Poort | Databricks CEO Ali Ghodsi | Terraform Industries CEO Casey Handmer | Scale AI CEO Alex Wang | Canva CEO Melanie PerkinsAnalysts:Om Malik on tech history | Joanna Stern on the Apple Vision Pro | Eric Seufert on digital advertising in February and October | Matthew Ball on VR and gaming | Daniel Gross and Nat Friedman on AI in February and June | Hugo Barra on AR and VR in March and October | Benedict Evans on regulation and AI | Michael Morton on e-commerce | Matthew Belloni on Hollywood and streaming | Marques Brownlee (MKBHD) on YouTube | Ben Bajarin on Apple and Intel | Craig Moffett on Apple and telecoms | Gregory Allen in October on the U.S. defense industry, and December on the China chip ban | Timothy B. Lee on AI and self-driving cars | Dylan Patel and Doug O’Laughlin on the semiconductor industry | Byrne Hobart on innovation | Tae Kim on The Nvidia WayThe Year in Stratechery UpdatesSome of my favorite Stratechery Updates:
- January 29: Apple and the DMA, Apple and “Or”, A Reluctant Apple Apologist (See also: European Commission Charges Apple, Apple Delays New Features for E.U.)
- February 19: Xbox’s Announcement; Microsoft’s Messy Middle; Apple in Europe, Continued
- March 12: Walmart Earnings, Walmart Connect and Closing the Loop, Walmart Acquires Vizio
- April 1: MLS on Vision Pro, The Vision Pro’s Missing Content, The Vision Pro’s DRI
- May 7: TSMC Earnings, TSMC’s Pricing Mistake, Intel v. TSMC
- May 20: Netflix and the NFL, Netflix Internalizes Ads, Comcast’s Bundle (See also: Netflix’s Boxing Event, Customer Acquisition vs. Churn Mitigation, Accounting for Events)
- June 18: FTC Sues Adobe, The Legal Question, The Value of Doing Right
- June 24: Perplexity and Robots.txt, Perplexity’s Defense, Google and Competition
- July 17: Tech For Trump, Breaking the Deal, From Inertness to Interest
- August 26: Telegram CEO Arrested, Telegram’s Non-Encrypted Advantage, Telegram Complexities
- September 16: OpenAI’s New Model, How o1 Works, Scaling Inference
- September 30: More on Orion, Where Vision Pro Went Wrong, Apple’s Response and Meta’s Motivation
- October 1: Taking Waymo, Uber and Waymo (See also: GM Kills Cruise, Fleets Versus Autonomy, Robotaxi Outlook)
- October 7: U.S. Communications Hacked, The History of CALEA, Encryption and Backdoors
- October 22: Stripe Acquires Bridge, Stablecoins, Platform of Platforms
- October 28: Trump on Rogan, The Voters Decide, The Podcast Election
- November 6: President Trump, Take Two; Big Tech, Little Tech, Chips, and Hardware; Elon Musk’s Triumph
- November 13: Shopify Earnings, Software Self-Awareness, Rebels and the Arms Dealer
- December 4: AWS re:Invent, Nova and Model Choice, AI as Commodity
- December 17: Google Announces Veo 2, The Empire Strikes Back, Free ChatGPT Search
I am so grateful to the subscribers that make it possible for me to do this as a job. I wish all of you a Merry Christmas and Happy New Year, and I’m looking forward to a great 2025!
Intel’s Death and Potential RevivalMonday, December 9, 2024Listen to PodcastWatch on YouTubeListen to this post:Log in to listenIn 1980 IBM, under pressure from its customers to provide computers for personal use, not just mainframes, set out to create the IBM PC; given the project’s low internal priority but high external demand they decided to outsource two critical components: Microsoft would provide the DOS operating system, which would run on the Intel 8088 processor.Those two deals would shape the computing industry for the following 27 years. Given that the point of the personal computer was to run applications, the operating system that provided the APIs for those applications would have unassailable lock-in, leading to Microsoft’s dominance with first DOS and then Windows, which was backwards compatible.The 8088 processor, meanwhile, was a low-cost variant of the 8086 processor; up to that point most new processors came with their own instruction set, but the 8088 and 8086 used the same instruction set, which became the foundation of Intel processors going forward. That meant that the 286 and 386 processors that followed were backwards compatible with the 8088 in the IBM PC; in other words, Intel, too, had lock-in, and not just with MS-DOS: while the majority of applications leveraged operating system-provided APIs, it was much more common at that point in computing history to leverage lower level APIs, including calling on the processor instruction set directly. This was particularly pertinent for things like drivers, which powered all of the various peripherals a PC required.Intel’s CISC MoatThe 8086 processor that undergirded the x86 instruction set was introduced in 1978, when memory was limited, expensive, and slow; that’s why the x86 used Complex Instruction Set Computing (CISC), which combined multiple steps into a single instruction. The price of this complexity was the necessity of microcode, dedicated logic that translated CISC instructions into its component steps so they could actually be executed.The same year that IBM cut those deals, however, was the year that David Patterson and a team at Berkeley started work on what became known as the RISC-1 processor, which took an entirely different approach: Reduced Instruction Set Computing (RISC) replaced the microcode-focused transistors with registers, i.e. memory that operated at the same speed as the processor itself, and filled them with simple instructions that corresponded directly to transistor functionality. This would, in theory, allow for faster computing with the same number of transistors, but memory access was still expensive and more likely to be invoked given the greater number of instructions necessary to do anything, and programs and compilers needed to be completely reworked to take advantage of the new approach.Intel, more than anyone, realized that this would be manageable in the long run. “Moore’s Law”, the observation that the number of transistors in an integrated circuit doubles every two years, was coined by Gordon Moore, their co-founder and second CEO; the implication for instruction sets was that increased software complexity and slow hardware would be solved through ever faster chips, and those chips could get even faster if they were simplified RISC designs. That is why most of the company wanted, in the mid-1980s, to abandon x86 and its CISC instruction set for RISC.There was one man, however, who interpreted the Moore’s Law implications differently, and that was Pat Gelsinger; he led the development of the 486 processor and was adamant that Intel stick with CISC, as he explained in an oral history at the Computer Museum:https://videopress.com/embed/k9yIgNfM?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataGelsinger: We had a mutual friend that found out that we had Mr. CISC working as a student of Mr. RISC, the commercial versus the university, the old versus the new, teacher versus student. We had public debates of John and Pat. And Bear Stearns had a big investor conference, a couple thousand people in the audience, and there was a public debate of RISC versus CISC at the time, of John versus Pat.And I start laying out the dogma of instruction set compatibility, architectural coherence, how software always becomes the determinant of any computer architecture being developed. “Software follows instruction set. Instruction set follows Moore’s Law. And unless you’re 10X better and John, you’re not 10X better, you’re lucky if you’re 2X better, Moore’s Law will just swamp you over time because architectural compatibility becomes so dominant in the adoption of any new computer platform.” And this is when x86– there was no server x86. There’s no clouds at this point in time. And John and I got into this big public debate and it was so popular.Brock: So the claim wasn’t that the CISC could beat the RISC or keep up to what exactly but the other overwhelming factors would make it the winner in the end.Gelsinger: Exactly. The argument was based on three fundamental tenets. One is that the gap was dramatically overstated and it wasn’t an asymptotic gap. There was a complexity gap associated with it but you’re going to make it leap up and that the CISC architecture could continue to benefit from Moore’s Law. And that Moore’s Law would continue to carry that forward based on simple ones, number of transistors to attack the CISC problems, frequency of transistors. You’ve got performance for free. And if that gap was in a reasonable frame, you know, if it’s less than 2x, hey, in a Moore’s Law’s term that’s less than a process generation. And the process generation is two years long. So how long does it take you to develop new software, porting operating systems, creating optimized compilers? If it’s less than five years you’re doing extraordinary in building new software systems. So if that gap is less than five years I’m going to crush you John because you cannot possibly establish a new architectural framework for which I’m not going to beat you just based on Moore’s Law, and the natural aggregation of the computer architecture benefits that I can bring in a compatible machine. And, of course, I was right and he was wrong.Intel would, over time, create more RISC-like processors, switching out microcode for micro-ops processing units that dynamically generated RISC-like instructions from CISC-based software that maintained backwards compatibility; Gelsinger was right that no one wanted to take the time to rewrite all of the software that assumed an x86 instruction set when Intel processors were getting faster all of the time, and far out-pacing RISC alternatives thanks to Intel’s manufacturing prowess.That, though, turned out to be Intel’s soft underbelly; while the late Intel CEO Paul Ottelini claimed that he turned down the iPhone processor contract because of price, Tony Fadell, who led the creation of the iPod and iPhone hardware, told me in a Stratechery Interview that the real issue was Intel’s obsession with performance and neglect of efficiency.The new dimension that always came in with embedded computing was always the power element, because on battery-operated devices, you have to rethink how you do your interrupt structures, how you do your networking, how you do your memory. You have to think about so many other parameters when you think about power and doing enough processing effectively, while having long battery life. So everything for me was about long, long battery life and why do we do what we do? David Tupman was on the team, the iPod team with me, he would always say every nanocoulomb was sacred, and we would go after that and say, “Okay, where’s that next coulomb? Where are we going to go after it?” And so when you take that microscopic view of what you’re building, you look at the world very differently.For me, when it came to Intel at the time, back in the mid-2000s, they were always about, “Well, we’ll just repackage what we have on the desktop for the laptop and then we’ll repackage that again for embedding.” It reminded me of Windows saying, “I’m going to do Windows and then I’m going to do Windows Mobile and I’m going to do Windows embedded.” It was using those same cores and kernels and trying to slim them down.I was always going, “Look, do you see how the iPhone was created? It started with the guts of the iPod, and we grew up from very little computing, and very little space, and we grew into an iPhone, and added more layers to it.” But we weren’t taking something big and shrinking it down. We were starting from the bottom up and yeah, we were taking Mac OS and shrinking it down, but we were significantly shrinking it down. Most people don’t want to take those real hard cuts to everything because they’re too worried about compatibility. Whereas if you’re just taking pieces and not worrying about compatibility, it’s a very different way of thinking about how building and designing products happens.This is why I was so specific with that “27 year” reference above; Apple’s 2007 launch of the iPhone marked the end of both Microsoft and Intel’s dominance, and for the same reason. The shift to efficiency as the top priority meant that you needed to rewrite everything; that, by extension, meant that Microsoft’s API and Intel’s x86 instruction set were no longer moats but millstones. On the operating side Apple stripped macOS to the bones and rebuilt it for efficiency; that became iOS, and the new foundation for apps; on the processor side Apple used processors based on the ARM instruction set, which was RISC from the beginning. Yes, that meant a lot of things had to be rewritten, but here the rewriting wasn’t happening by choice, but by necessity.This leads, as I remarked to Fadell in that interview, to a rather sympathetic interpretation of Microsoft and Intel’s failure to capture the mobile market; neither company had a chance. They were too invested in the dominant paradigm at the time, and thus unable to start from scratch; by the time they realized their mistake, Apple, Android, and ARM had already won.Intel’s Missed OpportunityIt was their respective response to missing mobile that saved Microsoft, and doomed Intel. For the first seven years of the iPhone both companies refused to accept their failure, and tried desperately to leverage what they viewed as their unassailable advantages: Microsoft declined to put its productivity applications on iOS or Android, trying to get customers to adopt Windows Mobile, while Intel tried to bring its manufacturing prowess to bear to build processors that were sufficiently efficient while still being x86 compatible.It was in 2014 that their paths diverged: Microsoft named Satya Nadella its new CEO, and his first public decision was to launch Office on iPad. This was a declaration of purpose: Microsoft would no longer be defined by Windows, and would instead focus on Azure and the cloud; no, that didn’t have the software lock-in of Windows — particularly since a key Azure decision was shifting from Windows servers to Linux — but it was a business that met Microsoft’s customers where they were, and gave the company a route to participating in the massive business opportunities enabled by mobile (given that most apps are in fact cloud services), and eventually, AI.The equivalent choice for Intel would have been to start manufacturing ARM chips for 3rd parties, i.e. becoming a foundry instead of an integrated device manufacturer (IDM); I wrote that they should do exactly that in 2013:It is manufacturing capability, on the other hand, that is increasingly rare, and thus, increasingly valuable. In fact, today there are only four major foundries: Samsung, GlobalFoundries, Taiwan Semiconductor Manufacturing Company, and Intel. Only four companies have the capacity to build the chips that are in every mobile device today, and in everything tomorrow.Massive demand, limited suppliers, huge barriers to entry. It’s a good time to be a manufacturing company. It is, potentially, a good time to be Intel. After all, of those four companies, the most advanced, by a significant margin, is Intel. The only problem is that Intel sees themselves as a design company, come hell or high water.Making chips for other companies would have required an overhaul of Intel’s culture and processes for the sake of what was then a significantly lower margin opportunity; Intel wasn’t interested, and proceeded to make a ton of money building server chips for the cloud.In fact, though, the company was already fatally wounded. Mobile meant volume, and as the cost of new processes skyrocketed, the need for volume to leverage those costs skyrocketed as well. It was TSMC that met the moment, with Apple’s assistance: the iPhone maker would buy out the first year of every new process advancement, giving TSMC the confidence to invest, and eventually surpass Intel. That, in turn, benefited AMD, Intel’s long-time rival which now fabbed its chips at TSMC, which not only had better processor designs but, for the first time, had a better process, leading to huge gains in the data center. All of that low-level work on ARM, meanwhile, helped make ARM in PCS and in the datacenter viable, putting further pressure on Intel’s core markets.AI was the final blow: not only did Intel not have a competitive product, it also did not have a foundry through which it could have benefitted from the exploding demand for AI chips; making matters worse is the fact that data center spending on GPUs is coming at the expense of traditional server chips, Intel’s core market.Intel’s DeathThe fundamental flaw with Pat Gelsinger’s 2020 return to Intel and his IDM 2.0 plan is that it was a decade too late. Gelsinger’s plan was to become a foundry, with Intel as its first-best customer. The former was the way to participate in mobile and AI and gain the volume necessary to push technology forward, which Intel has always done better than anyone else (EUV was the exception to the rule that Intel invents and introduces every new advance in processor technology); the latter was the way to fund the foundry and give it guaranteed volume.Again, this is exactly what Intel should have done a decade ago, while TSMC was still in their rear-view mirror in terms of processing technology, and when its products were still dominant in PCs and the data center. By the time Gelsinger came on board, though, it was already too late: Intel’s process was behind, its product market share was threatened on all of the fronts noted above, and high-performance ARM processors had been built by TSMC for years (which meant a big advantage in terms of pre-existing IP, design software, etc.). Intel brought nothing to the table as a foundry other than being a potential second source to TSMC, which, to make matters worse, has dramatically increased its investment in leading edge nodes to absorb that skyrocketing demand. Intel’s products, meanwhile, are either non-competitive (because they are made by Intel) or not-very-profitable (because they are made by TSMC), which means that Intel is simply running out of cash.Given this, you can make the case that Gelsinger was never the right person for the job; shortly after he took over I wrote in Intel Problems that the company needed to be split up, but he told me in a 2022 Stratechery Interview that he — and the board — weren’t interested in that:So last week, AMD briefly passed Intel in market value, and I think Nvidia did a while ago, and neither of these companies build their own chips. It’s kind of like an inverse of the Jerry Sanders quote about “Real men have fabs!” When you were contemplating your strategy for Intel as you came back, how much consideration was there about going the same path, becoming a fabless company and leaning into your design?PG: Let me give maybe three different answers to that question, and these become more intellectual as we go along. The first one was I wrote a strategy document for the board of directors and I said if you want to split the company in two, then you should hire a PE kind of guy to go do that, not me. My strategy is what’s become IDM 2.0 and I described it. So if you’re hiring me, that’s the strategy and 100% of the board asked me to be the CEO and supported the strategy I laid out, of which this is one of the pieces. So the first thing was all of that discussion happened before I took the job as the CEO, so there was no debate, no contemplation, et cetera, this is it.Fast forward to last week, and the Intel board — which is a long-running disaster — is no longer on board, firing Gelsinger in the process. And, to be honest, I noted a couple of months ago that Gelsinger’s plan probably wasn’t going to work without a split and a massive cash infusion from the U.S. government, far in excess of the CHIPS Act.That, though, doesn’t let the board off the hook: not only are they abandoning a plan they supported, their ideas for moving Intel forward are fundamentally wrong. Chairman Frank Yeary, who has inexplicable been promoted despite being present for the entirety of the Intel disaster, said in Intel’s press release about Gelsinger’s departure:While we have made significant progress in regaining manufacturing competitiveness and building the capabilities to be a world-class foundry, we know that we have much more work to do at the company and are committed to restoring investor confidence. As a board, we know first and foremost that we must put our product group at the center of all we do. Our customers demand this from us, and we will deliver for them. With MJ’s permanent elevation to CEO of Intel Products along with her interim co-CEO role of Intel, we are ensuring the product group will have the resources needed to deliver for our customers. Ultimately, returning to process leadership is central to product leadership, and we will remain focused on that mission while driving greater efficiency and improved profitability.Intel’s products are irrelevant to the future; that’s the fundamental foundry problem. If x86 still mattered, then Intel would be making enough money to fund its foundry efforts. Moreover, prospective Intel customers are wary that Intel — as it always has — will favor itself at the expense of its customers; the board is saying that is exactly what they want to do.In fact, it is Intel’s manufacturing that must be saved. This is a business that yes, needs billions upon billions of dollars in funding, but it not only has a market as a TSMC competitor, but also the potential to lead that market in the long run. Moreover, Intel foundry existing is critical to national security: currently the U.S. is completely dependent on TSMC and Taiwan and all of the geopolitical risk that entails. That means it will fall on the U.S. government to figure out a solution.Saving IntelLast month, in A Chance to Build, I explained how tech has modularized itself over the decades, with hardware — including semiconductor fabrication — largely being outsourced to Asia, while software is developed in the U.S. The economic forces undergirding this modularization, including the path dependency from the past sixty years, will be difficult to overcome, even with tariffs.Apple could not only not manufacture an iPhone in the U.S. because of cost, it also can’t do so because of capability; that capability is downstream of an ecosystem that has developed in Asia and a long learning curve that China has traveled and that the U.S. has abandoned. Ultimately, though, the benefit to Apple has been profound: the company has the best supply chain in the world, centered in China, that gives it the capability to build computers on an unimaginable scale with maximum quality for not that much money at all. This benefit has extended to every tech company, whether they make their own hardware or not. Software has to run on something, whether that be servers or computers or phones; hardware is software’s most essential complement.The inverse may be the key to American manufacturing: software as hardware’s grantor of viability through integration. This is what Tesla did: the company is deeply integrated from software down through components, and builds vehicles in California (of course it has an even greater advantage with its China factory).This is also what made Intel profitable for so long: the company’s lock-in was predicated on software, which allowed for massive profit margins that funded all of that innovation and leading edge processes in America, even as every other part of the hardware value chain went abroad. And, by extension, the reason why a product focus is a dead end for the company is because nothing is preserving x86 other than the status quo.It follows, then, that if the U.S. wants to make Intel viable, it ideally will not just give out money, but also a point of integration. To that end, consider this report from Reuters:A U.S. congressional commission on Tuesday proposed a Manhattan Project-style initiative to fund the development of AI systems that will be as smart or smarter than humans, amid intensifying competition with China over advanced technologies. The bipartisan U.S.-China Economic and Security Review Commission stressed that public-private partnerships are key in advancing artificial general intelligence, but did not give any specific investment strategies as it released its annual report.To quote the report’s recommendation directly:The Commission recommends:
1. Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability. AGI is generally defined as systems that are as good as or better than human capabilities across all cognitive domains and would surpass the sharpest human minds at every task. Among the specific actions the Commission
  recommends for Congress:
- Provide broad multiyear contracting authority to the executive branch and associated funding for leading artificial intelligence, cloud, and data center companies and others to advance the stated policy at a pace and scale consistent with the goal of U.S. AGI leadership; and
- Direct the U.S. secretary of defense to provide a Defense Priorities and Allocations System “DX Rating” to items in the artificial intelligence ecosystem to ensure this project receives national priority.
The problem with this proposal is that spending the money via “public-private partnerships” will simply lock-in the current paradigm; I explained in A Chance to Build:Software runs on hardware, and here Asia dominates. Consider AI:
- Chip design, a zero marginal cost activity, is done by Nvidia, a Silicon Valley company.
- Chip manufacturing, a minimal marginal cost activity that requires massive amounts of tacit knowledge gained through experience, is done by TSMC, a Taiwanese company.
- An AI system contains multiple components beyond the chip, many if not most of which are manufactured in China, or other countries in Asia.
- Final assembly generally happens outside of China due to U.S. export controls; Foxconn, for example, assembles many of its systems in Mexico.
- AI is deployed mostly by U.S. companies, and the vast majority of application development is done by tech companies and startups, primarily in Silicon Valley.
The fact that the U.S. is the bread in the AI sandwich is no accident: those are the parts of the value chain where marginal cost is non-existent and where the software talent has the highest leverage. Similarly, it’s no accident that the highest value add in terms of hardware happens in Asia, where expertise has been developing for fifty years. The easiest — and by extension, most low-value — aspect is assembly, which can happen anywhere labor is cheap.Given this, if the U.S. is serious about AGI, then the true Manhattan Project — doing something that will be very expensive and not necessarily economically rational — is filling in the middle of the sandwich. Saving Intel, in other words.Start with the fact that we know that leading AI model companies are interested in dedicated chips; OpenAI is reportedly working on its own chip with Broadcom, after flirting with the idea of building its own fabs. The latter isn’t viable for a software company in a world where TSMC exists, but it is for the U.S. government if it’s serious about domestic capabilities continuing to exist. The same story applies to Google, Amazon, Microsoft, and Meta.To that end, the U.S. government could fund an independent Intel foundry — spin out the product group along with the clueless board to Broadcom or Qualcomm or private equity — and provide price support for model builders to design and buy their chips there. Or, if the U.S. government wanted to build the whole sandwich, it could directly fund model builders — including one developed in-house — and dictate that they not just use but deeply integrate with Intel-fabricated integrated chips (it’s not out of the question that a fully integrated stack might actually be the optimal route to AGI).It would, to be sure, be a challenge to keep such an effort out of the clutches of the federal bureaucracy and the dysfunction that has befallen the U.S. defense industry. It would be essential to give this effort the level of independence and freedom that the original Manhattan Project had, with compensation packages to match; perhaps this would be a better use of Elon Musk’s time — himself another model builder — than DOGE?This could certainly be bearish for Nvidia, at least in the long run. Nvidia is a top priority for TSMC, and almost certainly has no interest in going anywhere else; that’s also why it would be self-defeating for a U.S. “Manhattan Project” to simply fund the status quo, which is Nvidia chips manufactured in Taiwan. Competition is ok, though; the point isn’t to kill TSMC, but to stand up a truly domestic alternative (i.e. not just a fraction of non-leading edge capacity in Arizona). Nvidia for its part deserves all of the success it is enjoying, but government-funded alternatives would ultimately manifest for consumers and businesses as lower prices for intelligence.This is all pretty fuzzy, to be clear. What does exist, however, is a need — domestically sourced and controlled AI, which must include chips — and a company, in Intel, that is best placed to meet that need, even as it needs a rescue. Intel lost its reason to exist, even as the U.S. needs it to exist more than ever; AI is the potential integration point to solve both problems at the same time.https://www.youtube.com/embed/HGPdfV7evFU?si=nNUkDNbAE90x94jc&controls=0
The Gen AI Bridge to the FutureMonday, December 2, 2024Listen to Podcast Watch on YouTubeListen to this post:Log in to listenIn the beginning was the mainframe.In 1945 the U.S. government built ENIAC, an acronym for Electronic Numerical Integrator and Computer, to do ballistics trajectory calculations for the military; World War 2 was nearing its conclusion, however, so ENIAC’s first major job was to do calculations that undergirded the development of the hydrogen bomb. Six years later, J. Presper Eckert and John Mauchly, who led the development of ENIAC, launched UNIVAC, the Universal Automatic Computer, for broader government and commercial applications. Early use cases included calculating the U.S. census and assisting with calculation-intensive back office operations like payroll and bookkeeping.These were hardly computers as we know them today, but rather calculation machines that took in reams of data (via punch cards or magnetic tape) and returned results according to hardwired calculation routines; the “operating system” were the humans actually inputting the data, scheduling jobs, and giving explicit hardware instructions. Originally this instruction also happened via punch cards and magnetic tape, but later models added consoles to both provide status and also allow for register-level control; these consoles evolved into terminals, but the first versions of these terminals, like the one that was available for the original version of the IBM System/360, were used to initiate batch programs.Any recounting of computing history usually focuses on the bottom two levels of that stack — the device and the input method — because they tend to evolve in parallel. For example, here are the three major computing paradigms to date:These aren’t perfect delineations; the first PCs had terminal-like interfaces, and pre-iPhone smartphones used windows-icons-menus-pointer (WIMP) interaction paradigms, with built-in keyboards and styluses. In the grand scheme of things, though, the distinction is pretty clear, and, by extension, it’s pretty easy to predict what is next:Wearables is an admittedly broad category that includes everything from smart watches to earpieces to glasses, but I think it is a cogent one: the defining characteristic of all of these devices, particularly in contrast to the three previous paradigms, is the absence of a direct mechanical input mechanism; that leaves speech, gestures, and at the most primitive level, thought.Fortunately there is good progress being made on on all of these fronts: the quality and speed of voice interaction has increased dramatically over the last few years; camera-intermediated gestures on the Oculus and Vision Pro work well, and Meta’s Orion wristband uses electromyography (EMG) to interpret gestures without any cameras at all. Neuralink is even more incredible: an implant in the brain captures thoughts directly and translates them into actions.These paradigms, however, do not exist in isolation. First off, mainframes still exist, and I’m typing this Article on a PC, even if you may consume it on a phone or via a wearable like a set of AirPods. What stands out to me, however, is the top level of the initial stack I illustrated above: the application layer on one paradigm provides the bridge to the next one. This, more than anything, is why generative AI is a big deal in terms of realizing the future.Bridges to the FutureI mentioned the seminal IBM System/360 above, which was actually a family of mainframes; the first version was the Model 30, which, as I noted, did batch processing: you would load up a job using punch cards or magnetic tape and execute the job, just like you did with the ENIAC or UNIVAC. Two years later, however, IBM came out with the Model 67 and the TSS/360 operating system: now you could actually interact with a program via the terminal. This represented a new paradigm at the application layer:It is, admittedly, a bit confusing to refer to this new paradigm at the application layer as Applications, but it is the most accurate nomenclature; what differentiated an application from a program was that while the latter was a pre-determined set of actions that ran as a job, the former could be interacted with and amended while running.That new application layer, meanwhile, opened up the possibility for an entirely new industry to create those applications, which could run across the entire System/360 family of mainframes. New applications, in turn, drove demand for more convenient access to the computer itself. This ultimately led to the development of the personal computer (PC), which was an individual application platform:Initial PCs operated from a terminal-like text interface, but truly exploded in popularity with the roll-out of the WIMP interface, which was invented by Xerox PARC, commercialized by Apple, and disseminated by Microsoft. The key point in terms of this Article, however, is that Applications came first: the concept created the bridge from mainframes to PCs.PCs underwent their own transformation over their two decades of dominance, first in terms of speed and then in form factor, with the rise of laptops. The key innovation at the application layer, however, was the Internet:The Internet differed from traditional applications by virtue of being available on every PC, facilitating communication between PCs, and by being agnostic to the actual device it was accessed on. This, in turn, provided the bridge to the next device paradigm, the smartphone, with its touch interface:I’ve long noted that Microsoft did not miss mobile; their error was in trying to extend the PC paradigm to mobile. This not only led to a focus on the wrong interface (WIMP via stylus and built-in keyboard), but also an assumption that the application layer, which Windows dominated, would be a key differentiator.Apple, famously, figured out the right interface for the smartphone, and built an entirely new operating system around touch. Yes, iOS is based on macOS at a low level, but it was a completely new operating system in a way that Windows Mobile was not; at the same time, because iOS was based on macOS, it was far more capable than smartphone-only alternatives like BlackBerry OS or PalmOS. The key aspect of this capability was that the iPhone could access the real Internet.What is funny is that Steve Jobs’ initial announcement of this capability was met with much less enthusiasm than the iPhone’s other two selling points of being a widescreen iPod and a mobile phone:https://videopress.com/embed/3OlG8bv1?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataToday, we’re introducing three revolutionary products of this class. The first one is a wide-screen iPod with touch controls. The second is a revolutionary mobile phone. The third is a breakthrough Internet communications device…These are not three separate devices, this is one device, and we are calling iPhone. Today, Apple is going to reinvent the phone.I’ve watched that segment hundreds of times, and the audience’s confusion at “Internet communications device” cracks me up every time; in fact, that was the key factor in reinventing the phone, because it was the bridge that linked a device in your pocket to the world of computing writ large, via the Internet. Jobs listed the initial Internet features later on in the keynote:https://videopress.com/embed/yYR6Pz3q?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataNow let’s take a look at an Internet communications device, part of the iPhone. What’s this all about? Well, we’ve got some real breakthroughs here: to start off with, we’ve got rich HTML email on iPhone. The first time, really rich email on a mobile device, and it works with any IMAP or POP email service. You’ve got your favorite mail service, it’ll likely work with it, and it’s rich text email. We wanted the best web browser on our phone, not a baby browser or a WAP browser, a real browser, and we picked the best one in the world: Safari, and we have Safari running on iPhone. It is the first fully-usable HTML browser on a phone. Third, we have Google Maps. Maps, satellite images, directions, and traffic. This is unbelievable, wait until you see it. We have Widgets, starting off with weather and stocks. And, this communicates with the Internet over Edge and Wifi, and iPhone automatically detects Wifi and switches seamless to it. You don’t have to manage the network, it just does the right thing.Notice that the Internet is not just the web; in fact, while Apple wouldn’t launch a 3rd-party App Store until the following year, it did, with the initial iPhone, launch the app paradigm which, in contrast to standalone Applications from the PC days, assumed and depended on the Internet for functionality.The Generative AI BridgeWe already established above that the next paradigm is wearables. Wearables today, however, are very much in the pre-iPhone era. On one hand you have standalone platforms like Oculus, with its own operating system, app store, etc.; the best analogy is a video game console, which is technically a computer, but is not commonly thought of as such given its singular purpose. On the other hand, you have devices like smart watches, AirPods, and smart glasses, which are extensions of the phone; the analogy here is the iPod, which provided great functionality but was not a general computing device.Now Apple might dispute this characterization in terms of the Vision Pro specifically, which not only has a PC-class M2 chip, along with its own visionOS operating system and apps, but can also run iPad apps. In truth, though, this makes the Vision Pro akin to Microsoft Mobile: yes, it is a capable device, but it is stuck in the wrong paradigm, i.e. the previous one that Apple dominated. Or, to put it another way, I don’t view “apps” as the bridge between mobile and wearables; apps are just the way we access the Internet on mobile, and the Internet was the old bridge, not the new one.To think about the next bridge, it’s useful to jump forward to the future and work backwards; that jump forward is a lot easier to envision, for me anyways, thanks to my experience with Meta’s Orion AR glasses:They’re real and they’re spectacular. pic.twitter.com/hIJZuS6taY— Ben Thompson (@benthompson) September 25, 2024The most impressive aspect of Orion is the resolution, which is perfect. I’m referring, of course, to the fact that you can see the real world with your actual eyes; I wrote in an Update:The reality is that the only truly satisfactory answer to passthrough is to not need it at all. Orion has perfect field-of-view and infinite resolution because you’re looking at the real world; it’s also dramatically smaller and lighter. Moreover, this perfect fidelity actually gives more degrees of freedom in terms of delivering the AR experience: no matter how high resolution the display is, it will still be lower resolution than the world around it; I tried a version of Orion with double the resolution and, honestly, it wasn’t that different, because the magic was in having augmented reality at all, not in its resolution. I suspect the same thing applies to field of view: 70 degrees seemed massive on Orion, even though that is less than the Vision Pro’s 100 degrees, because the edge of the field of view for Orion was reality, whereas the edge for the Vision Pro is, well, nothing.The current iteration of Orion’s software did have an Oculus-adjacent launch screen, and an Instagram prototype; it was, in my estimation, the least impressive part of the demonstration, for the same reason that I think the Vision Pro’s iPad app compatibility is a long-term limitation: it was simply taking the mobile paradigm and putting it in front of my face, and honestly, I’d rather just use my phone.One of the most impressive demos, meanwhile, had the least UI: it was just a notification. I glanced up, saw that someone was calling me, touched my fingers together to “click” on the accept button that accompanied the notification, and was instantly talking to someone in another room while still being able to interact freely with the world around me. Of course phone calls aren’t some sort of new invention; what made the demo memorable was that I only got the UI I needed when I needed it.This, I think, is the future: the exact UI you need — and nothing more — exactly when you need it, and at no time else. This specific example was, of course, programmed deterministically, but you can imagine a future where the glasses are smart enough to generate UI on the fly based on the context of not just your request, but also your broader surroundings and state.This is where you start to see the bridge: what I am describing is an application of generative AI, specifically to on-demand UI interfaces. It’s also an application that you can imagine being useful on devices that already exist. A watch application, for example, would be much more usable if, instead of trying to navigate by touch like a small iPhone, it could simply show you the exact choices you need to make at a specific moment in time. Again, we get hints of that today through deterministic programming, but the ultimate application will be on-demand via generative AI.Of course generative AI is also usable on the phone, and that is where I expect most of the exploration around generative UI to happen for now. We certainly see plenty of experimentation and rapid development of generative AI broadly, just as we saw plenty of experimentation and rapid development of the Internet on PCs. That experimentation and development was not just usable on the PC, but it also created the bridge to the smartphone; I think that generative AI is doing the same thing in terms of building a bridge to wearables that are not accessories, but general purpose computers in their own right:This is exciting in the long-term, and bullish for Meta (and I’ve previously noted how generative AI is the key to the metaverse, as well). It’s also, clearly, well into the future. It also helps explain why Orion isn’t shipping today: it’s not just that the hardware isn’t yet in a production state, particularly from a cost perspective, but the entire application layer needs to be built out, first on today’s devices, enabling the same sort of smooth transition that the iPhone had. No, Apple didn’t have the App Store, but the iPhone was extraordinarily useful on day one, because it was an Internet Communicator.Survey CompleteTen years ago I wrote a post entitled The State of Consumer Technology in 2014, where I explored some of the same paradigm-shifts I detailed in this Article. This was the illustration I made then:There is a perspective in which 2024 has been a bit of a letdown in terms of generative AI; there hasn’t been a GPT-5 level model released; the more meaningful developments have been in the vastly increased efficiency and reduction in size of GPT-4 level models, and the inference-scaling possibilities of o1. Concerns are rising that we may have hit a data wall, and that there won’t be more intelligent AI without new fundamental breakthroughs in AI architecture.I, however, feel quite optimistic. To me the story of 2024 has been filling in those question marks in that illustration. The product overhang from the generative AI capabilities we have today are absolutely massive: there are so many new things to be built, and completely new application layer paradigms are at the top of the list. That, by extension, is the bridge that will unlock entirely new paradigms of computing. The road to the future needs to be built; it’s exciting to have the sense that the surveying is now complete.https://www.youtube.com/embed/nbMUysaXFZ0?si=SLxDBK-agQs6HT7d&controls=0
A Chance to BuildMonday, November 18, 2024Listen to PodcastWatch on YouTubeListen to this post:Log in to listenSemiconductors are so integral to the history of Silicon Valley that they give the region its name, and, more importantly, its culture: chips require huge amounts of up-front investment, but they have, relative to most other manufactured goods, minimal marginal costs; this economic reality helped drive the development of the venture capital model, which provided unencumbered startup capital to companies who could earn theoretically unlimited returns at scale. This model worked even better with software, which was perfectly replicable.That history starts in 1956, when William Shockley founded the Shockley Semiconductor Laboratory to commercialize the transistor that he had helped invent at Bell Labs; he chose Mountain View to be close to his ailing mother. A year later the so-called “Traitorous Eight”, led by Robert Noyce, left and founded Fairchild Semiconductor down the road. Six years after that Fairchild Semiconductor opened a facility in Hong Kong to assemble and test semiconductors. Assembly required manually attaching wires to a semiconductor chip, a labor-intensive and monotonous task that was difficult to do economically with American wages, which ran about $2.50/hour; Hong Kong wages were a tenth of that. Four years later Texas Instruments opened a facility in Taiwan, where wages were $0.19/hour; two years after that Fairchild Semiconductor opened another facility in Singapore, where wages were $0.11/hour.In other words, you can make the case that the classic story of Silicon Valley isn’t completely honest. Chips did have marginal costs, but that marginal cost was, within single digit years of the founding of Silicon Valley, exported to Asia.Moreover, that exportation was done with the help of the U.S. government. In 1962 the U.S. Congress passed the Tariff Classification Act of 1962, which amended the Tariff Act of 1930 to implement new tariff schedules developed by the United States Tariff Commission; those new schedules were implemented in 1963, and included Tariff Item 807.00, which read:Articles assembled abroad in whole or in part of products of the United States which were exported for such purpose and which have not been advanced in value or improved in condition abroad by any means other than by the act of assembly:
- A duty upon the full value of the imported article, less the cost or value of such products of the United States.
The average Hong Kong worker assembled around 24 chips per hour; that meant their value add to the overall cost of the chip was just over $0.01, which means that tariffs were practically non-existent. This was by design! Chris Miller writes in Chip War:South Vietnam would send shock waves across Asia. Foreign policy strategists perceived ethnic Chinese communities all over the region as ripe for Communist penetration, ready to fall to Communist influence like a cascade of dominoes. Malaysia’s ethnic Chinese minority formed the backbone of that country’s Communist Party, for example. Singapore’s restive working class was majority ethnic Chinese. Beijing was searching for allies—and probing for U.S. weakness…By the end of the 1970s, American semiconductor firms employed tens of thousands of workers internationally, mostly in Korea, Taiwan, and Southeast Asia. A new international alliance emerged between Texan and Californian chipmakers, Asian autocrats, and the often ethnic-Chinese workers who staffed many of Asia’s semiconductor assembly facilities.Semiconductors recast the economies and politics of America’s friends in the region. Cities that had been breeding grounds for political radicalism were transformed by diligent assembly line workers, happy to trade unemployment or subsistence farming for better paying jobs in factories. By the early 1980s, the electronics industry accounted for 7 percent of Singapore’s GNP and a quarter of its manufacturing jobs. Of electronics production, 60 percent was semiconductor devices, and much of the rest was goods that couldn’t work without semiconductors. In Hong Kong, electronics manufacturing created more jobs than any sector except textiles. In Malaysia, semiconductor production boomed in Penang, Kuala Lumpur, and Melaka, with new manufacturing jobs providing work for many of the 15 percent of Malaysian workers who had left farms and moved to cities between 1970 and 1980. Such vast migrations are often politically destabilizing, but Malaysia kept its unemployment rate low with many relatively well-paid electronics assembly jobs.This is a situation that, at least in theory, should not persist indefinitely; increased demand for Asian labor should push up both the cost of that labor and also the currency of the countries where that labor is in demand, making those countries less competitive over time. The former has certainly happened: Taiwan, where I live, is one of the richest countries in the world. And yet chip-making is centered here to a greater extent than ever before, in seeming defiance of theory.The Post-War OrderThe problem with theory is usually reality; in 1944 the U.S. led the way in establishing what came to be known as the Bretton Woods System, which pegged exchange rates to the U.S. dollar. This was a boon to the devastated economies of Europe and first Japan, and then later the rest of Asia: an influx of U.S. capital rebuilt their manufacturing capability by leveraging their relatively lower cost of labor. This did raise labor costs, but thanks to the currency peg, the U.S. currency couldn’t depreciate in response, which in turn made U.S. debt much more attractive than it might have been otherwise for those manufacturing profits, which in turn helped to fund both the Vietnam War and the 1960’s expansion in social programs.Ultimately this pressure on the U.S. dollar was too intense, leading to the dissolution of Bretton Woods in 1971 and a depreciation of the U.S. dollar relative to gold; the overall structure of the world economy, however, was set: trade was denominated in dollars — i.e. the U.S. dollar was the world’s reserve currency — which kept its value higher than economic theory would dictate. This made U.S. debt attractive, which funded deficit spending; that spending fueled the U.S. consumer market, which bought imported manufactured goods; the profits of those goods were reinvested into U.S. debt, which helped pay for the military that kept the entire system secure.The biggest winner was the U.S. consumer. Money was cycled into the economy through an impressive and seemingly impossible array of service sector jobs and quickly spent on cheap imports. Those cheap imports were getting better too: to take chips as an example, increased automation decreased costs, and the development of software made those chips much more valuable. This applied not just to the chips directly, but everything built with and enabled by them; the actual building of electronics happened in Asia, as countries rapidly ascended the technological ladder, but the software was the province of Silicon Valley.This is where it matters that software is truly a zero marginal cost product. R&D costs for tech companies have skyrocketed for decades, but that increase has been more than offset by the value created by software and captured by scale. Moreover, those increasing costs manifested as the highest salaries in the world for talent, the true scarce resource in technology. This meant that the most capable technologists made their way to the U.S. generally and Silicon Valley specifically to earn the most money, and, if they had the opportunity and drive, to create new companies as software ate the world.Still, software runs on hardware, and here Asia dominates. Consider AI:
- Chip design, a zero marginal cost activity, is done by Nvidia, a Silicon Valley company.
- Chip manufacturing, a minimal marginal cost activity that requires massive amounts of tacit knowledge gained through experience, is done by TSMC, a Taiwanese company.
- An AI system contains multiple components beyond the chip, many if not most of which are manufactured in China, or other countries in Asia.
- Final assembly generally happens outside of China due to U.S. export controls; Foxconn, for example, assembles many of its systems in Mexico.
- AI is deployed mostly by U.S. companies, and the vast majority of application development is done by tech companies and startups, primarily in Silicon Valley.
The fact that the U.S. is the bread in the AI sandwich is no accident: those are the parts of the value chain where marginal cost is non-existent and where the software talent has the highest leverage. Similarly, it’s no accident that the highest value add in terms of hardware happens in Asia, where expertise has been developing for fifty years. The easiest — and by extension, most low-value — aspect is assembly, which can happen anywhere labor is cheap.All of this has happened in a world where the trend in trade was towards more openness and fewer barriers, at least in terms of facilitating this cycle. One key development was the Information Technology Agreement (ITA), a 1996 World Trade Organization agreement, which completely eliminated tariffs on IT products, including chips. The Internet, meanwhile, meant there were no barriers to the spread of software, with the notable exception of China’s Great Firewall; the end result is that while U.S. software ran on Asian hardware, it was U.S. companies that ultimately reaped the largest returns from scale.Cars and ChinaPerhaps the defining characteristic of the Clinton-Bush-Obama era was the assumption that this system would continue forever; it certainly was plausible in terms of products. Consider cars: for a hundred years cars were marvelous mechanical devices with tens of thousands of finely engineered parts predicated on harnessing the power of combustion to transport people and products wherever they wished to go. Electric cars, however, are something else entirely: yes, there is still a mechanical aspect, as there must be to achieve movement in physical space, but the entire process is predicated on converting electricity to mechanical movement, and governed entirely by chips and software.This products looks a lot more like a computer on wheels than the mechanical cars we are familiar with; it follows, then, that the ultimate structure of the car industry might end up looking something like the structure of AI: the U.S. dominates the zero marginal cost components like design and the user experience, while Asia — China specifically, given the scale and labor requirements — dominates manufacturing.This has been Waymo’s plan; while current self-driving cars on the road are retrofitted Jaguar I-Pace sedans, the 6th-generation Waymo vehicle is manufactured by Chinese car company Geely. This car, called Zeekr, is purpose-built for transportation, but ideally it would be custom-built for Waymo’s purposes: you can imagine future fleets of self-driving cars with designs for different use cases, from individual taxis to groups to working offices to sleeper cars. The analogy here would be to personal computing devices: you can get a computer in rack form, a desktop, a laptop, or a phone; the chips and software are by-and-large the same.Cars aren’t there yet, but they’re not far off; the relative simplicity of electric cars makes it more viable for established car manufactures to basically offer customizable platforms: that is how a company like Xiaomi can develop its own SUV. The consumer electronics company, most well-known for its smartphones, contracts with Beijing Automotive Group for manufacturing, while doing the design and technological integration. Huawei has a similar arrangement with Seres, Changan, and Chery Automobile.Tesla, it should be noted, is a bit different: the company is extremely vertically integrated, building not only its own hardware and software but also a significant number of the components that go into its cars; this isn’t a surprise, given Tesla’s pioneering role in electrical cars (pioneers are usually vertically integrated), but it does mean that Tesla faces a significant long-term threat from the more modular Chinese approach. Given this, it’s not a surprise that Elon Musk is staking Tesla’s long-term future on autonomy, in effect doubling down on the company’s integration.Regardless, what is notable — and ought to be a wake-up call to Silicon Valley — is the fact that the Xiaomi and Huawei cars run Chinese software. One of the under-appreciated benefits of the Great Firewall is that it created an attractive market for software developers that was not reachable from Silicon Valley; this means that while a good number of Chinese software engineers are in the U.S., there is a lot of talent in China as well, and that talent is being applied to products that can leverage Chinese manufacturing to win markets Silicon Valley thought would be theirs to sandwich forever.Waymo’s Zeekr car, meanwhile, has a problem; from Bloomberg in May:President Joe Biden will quadruple tariffs on Chinese electric vehicles and sharply increase levies for other key industries this week, unveiling the measures at a White House event framed as a defense of American workers, people familiar with the matter said. Biden will hike or add tariffs in the targeted sectors after nearly two years of review. The total tariff on Chinese EVs will rise to 102.5% from 27.5%, the people said, speaking on condition of anonymity ahead of the announcement. Others will double or triple in targeted industries, though the scope remains unclear.Given this, it’s no surprise that Waymo had a new announcement in October: Waymo was partnering with Hyundai for new self-driving cars that are manufactured in America. This car is a retro-fitted IONIQ 5, which is built as a passenger car, unlike the transportation-focused Zeekr; in other words, Google is taking a step back in functionality because of government policy.Trump’s TariffsWaymo may not be the only company taking a step back: newly (re-)elected President Trump’s signature economic proposal is tariffs. From the 2024 GOP Platform:Our Trade deficit in goods has grown to over $1 Trillion Dollars a year. Republicans will support baseline Tariffs on Foreign-made goods, pass the Trump Reciprocal Trade Act, and respond to unfair Trading practices. As Tariffs on Foreign Producers go up, Taxes on American Workers, Families, and Businesses can come down.Foreign Policy published an explainer over the weekend entitled Everything You Wanted to Know About Trump’s Tariffs But Were Afraid to Ask:U.S. President-elect Donald Trump, the self-proclaimed “tariff man,” campaigned on the promise of ratcheting import duties as high as 60 percent against all goods from China, and perhaps 20 percent on everything from everywhere else. And he might be able to do it—including by drawing on little-remembered authorities from the 1930 Smoot-Hawley Tariff Act, the previous nadir of U.S. trade policy.Trump’s tariff plans are cheered by most of his economic advisers, who see them as a useful tool to rebalance an import-dependent U.S. economy. Most economists fear the inflationary impacts of sharply higher taxes on U.S. consumers and businesses, as well as the deliberate drag on economic growth that comes from making everything more expensive. Other countries are mostly confused, uncertain whether Trump’s tariff talk is just bluster to secure favorable trade deals for the United States, or if they’ll be more narrowly targeted or smaller than promised. Big economies, such as China and the European Union, are preparing their reprisals, just in case.What makes it hard for economists to model and other countries to understand is that nobody, even in Trump world, seems to know exactly why tariffs are on the table.Sounds like the explainer needs an explainer! Or maybe the author was afraid to ask, but I digress.The story to me seems straightforward: the big loser in the post World War 2 reconfiguration I described above was the American worker; yes, we have all of those service jobs, but what we have much less of are traditional manufacturing jobs. What happened to chips in the 1960s happened to manufacturing of all kinds over the ensuing decades. Countries like China started with labor cost advantages, and, over time, moved up learning curves that the U.S. dismantled; that is how you end up with this from Walter Isaacson in his Steve Jobs biography about a dinner with then-President Obama:When Jobs’s turn came, he stressed the need for more trained engineers and suggested that any foreign students who earned an engineering degree in the United States should be given a visa to stay in the country. Obama said that could be done only in the context of the “Dream Act,” which would allow illegal aliens who arrived as minors and finished high school to become legal residents — something that the Republicans had blocked. Jobs found this an annoying example of how politics can lead to paralysis. “The president is very smart, but he kept explaining to us reasons why things can’t get done,” he recalled. “It infuriates me.”Jobs went on to urge that a way be found to train more American engineers. Apple had 700,000 factory workers employed in China, he said, and that was because it needed 30,000 engineers on-site to support those workers. “You can’t find that many in America to hire,” he said. These factory engineers did not have to be PhDs or geniuses; they simply needed to have basic engineering skills for manufacturing. Tech schools, community colleges, or trade schools could train them. “If you could educate these engineers,” he said, “we could move more manufacturing plants here.” The argument made a strong impression on the president. Two or three times over the next month he told his aides, “We’ve got to find ways to train those 30,000 manufacturing engineers that Jobs told us about.”I think that Jobs had cause-and-effect backwards: there are not 30,000 manufacturing engineers in the U.S. because there are not 30,000 manufacturing engineering jobs to be filled. That is because the structure of the world economy — choices made starting with Bretton Woods in particular, and cemented by the removal of tariffs over time — made them nonviable. Say what you will about the viability or wisdom of Trump’s tariffs, the motivation — to undo eighty years of structural changes — is pretty straightforward!The other thing about Jobs’ answer is how ultimately self-serving it was. This is not to say it was wrong: Apple could not only not manufacture an iPhone in the U.S. because of cost, it also can’t do so because of capability; that capability is downstream of an ecosystem that has developed in Asia and a long learning curve that China has traveled and that the U.S. has abandoned. Ultimately, though, the benefit to Apple has been profound: the company has the best supply chain in the world, centered in China, that gives it the capability to build computers on an unimaginable scale with maximum quality for not that much money at all.This benefit has extended to every tech company, whether they make their own hardware or not. Software has to run on something, whether that be servers or computer or phones; hardware is software’s most essential complement. Joel Spolsky, in his canonical post about commoditizing your complements, wrote:Every product in the marketplace has substitutes and complements…A complement is a product that you usually buy together with another product. Gas and cars are complements. Computer hardware is a classic complement of computer operating systems. And babysitters are a complement of dinner at fine restaurants. In a small town, when the local five star restaurant has a two-for-one Valentine’s day special, the local babysitters double their rates. (Actually, the nine-year-olds get roped into early service.)…Demand for a product increases when the price of its complements decreases. In general, a company’s strategic interest is going to be to get the price of their complements as low as possible. The lowest theoretically sustainable price would be the “commodity price” — the price that arises when you have a bunch of competitors offering indistinguishable goods…If you can run your software anywhere, that makes hardware more of a commodity. As hardware prices go down, the market expands, driving more demand for software (and leaving customers with extra money to spend on software which can now be more expensive.)…Spolsky’s post was written in 2002, well before the rise of smartphones and, more pertinently, ad-supported software that now permeates our world. That, though, only makes his point: hardware has become so cheap and so widespread that software can be astronomically valuable even as it’s free to end users. Which, by the way, is that other part of the boon to consumers I noted above.It’s Time to BuildA mistake many analysts make, particularly Americans, is viewing the U.S. as the only agent of change in the world; events like the Ukraine or Gaza wars are reminders that we aren’t in control of world events, and nothing would make that lesson clearer than a Chinese move on Taiwan. At the same time, we are living in a system the U.S. built, so it’s worth thinking seriously about the implications of a President with a mandate to blow the whole thing up.The first point is perhaps the most comforting: there is a good chance that Trump makes a lot of noise and accomplishes little, at least in terms of trade and — pertinently for this blog — its impact on tech. That is arguably what happened his first term: there were China tariffs (that Apple was excluded from), and a ban on chip shipments to Huawei (that massively buoyed Apple), and TSMC committed to building N-1 fabs in Arizona. From a big picture perspective, though, today Silicon Valley is more powerful and richer than ever, and the hardware dominance of Asia generally and China specifically is larger than ever.The reality is that uprooting the current system would take years of upheaval and political and economic pain; those who argue it is impossible are wrong, but believing it’s highly improbable is very legitimate. Indeed, it may be the case that systems can only truly be remade in the presence of an exogenous destructive force, which is to say war.The second point, though, is that there does seem to be both more risk and opportunity than many people think. Tariffs do change things; by virtue of my location I talk to plenty of people on the ground who have been busy for years moving factories, not from China to the U.S., but to places like Thailand or Vietnam. That doesn’t really affect the trade deficit, but things that matter don’t always show up in aggregate numbers.To that end, the risk for tech is that tariffs specifically and Trump’s approach to trade generally do more damage to the golden goose than expected. More expensive hardware ultimately constricts the market for software; tariffs in violation of agreements like the ITA give the opening for other countries to impose levies of their own, and U.S. tech companies could very well be popular targets.The opportunity, meanwhile, is to build new kinds of manufacturing companies that can seize on a tariff-granted price advantage. These sorts of companies, perhaps to Trump’s frustration, are not likely to be employment powerhouses; the real opportunity is taking advantage of robotics and AI to make physical goods into zero marginal cost items in their own right (outside of commodities; this is what has happened to chips: assembly and testing are fully automated, which makes a U.S. buildout viable).To take a perhaps unintuitive example, consider Amazon: the company is deeply investing in automation for its fulfillment centers, which decreases the marginal cost of picking, enabling the company to sell more items like “Everyday Essentials” that don’t cost much but are purchased frequently; it’s also no surprise that Amazon is invested in drones and self-driving car startups, to take the same costs out of delivery. It’s a long journey, to be sure, but it’s a destination that is increasingly possible to imagine.The analogy to manufacturing is that a combination of automation and modular platforms, defined by software, is both necessary and perhaps possible to build for the first time in a long time. It won’t be an easy road — see Tesla’s struggles with automation — but there is, at a minimum, a market in national security, and perhaps arenas like self-driving cars, to build something scalable with assumptions around modularity and software-defined functionality at the core.Again, I don’t know if this will work: the symbiotic relationship between Silicon Valley software makers and Asian hardware manufactures is one of the most potent economic combinations in history, and it may be impossible to compete with; if it’s ever going to work, though, the best opportunity — absent a war, God forbid — is probably right now.https://www.youtube.com/embed/6rKj79A1D44?si=JCcOVcN5BG-bTexD&controls=0
Meta’s AI AbundanceTuesday, October 29, 2024Listen to Podcast Watch on YouTubeListen to this post:Log in to listenStratechery has benefited from a Meta cheat code since its inception: wait for investors to panic, the stock to drop, and write an Article that says Meta is fine — better than fine even — and sit back and watch the take be proven correct. Notable examples include 2013’s post-IPO swoon, the 2018 Stories swoon, and most recently, the 2022 TikTok/Reels swoon (if you want a bonus, I was optimistic during the 2020 COVID swoon too):Perhaps with that in mind I wrote a cautionary note earlier this year about Meta and Reasonable Doubt: while investors were concerned about the sustainability of Meta’s spending on AI, I was worried about increasing ad prices and the lack of new formats after Stories and then Reels; the long-term future, particularly in terms of the metaverse, was just as much of a mystery as always.Six months on and I feel the exact opposite: it seems increasingly clear to me that Meta is in fact the most well-placed company to take advantage of generative AI. Yes, investors are currently optimistic, so this isn’t my usual contrarian take — unless you consider the fact that I think Meta has the potential to be the most valuable company in the world. As evidence of that fact I’m writing today, a day before Meta’s earnings: I don’t care if they’re up or down, because the future is that bright.Short-term: Generative AI and Digital AdvertisingGenerative AI is clearly a big deal, but the biggest winner so far is Nvidia, in one of the clearest examples of the picks-and-shovels ethos on which San Francisco was founded: the most money to be made is in furnishing the Forty-niners (yes, I am using a linear scale instead of the log scale above for effect):The big question weighing on investors’ minds is when all of this GPU spend will generate a return. Tesla and xAI are dreaming of autonomy; Azure, Google Cloud, AWS, and Oracle want to undergird the next generation of AI-powered startups; and Microsoft and Salesforce are bickering about how to sell AI into the enterprise. All of these bets are somewhat speculative; what would be the most valuable in the short-term, at least in terms of justifying the massive ongoing capital expenditure necessary to create the largest models, is a guaranteed means to translate those costs into bottom-line benefit.Meta is the best positioned to do that in the short-term, thanks to the obvious benefit of applying generative AI to advertising. Meta is already highly reliant on machine learning for its ads product: right now an advertiser can buy ads based on desired outcomes, whether that be an app install or a purchase, and leave everything else up to Meta; Meta will work across their vast troves of data in a way that is only possible using machine learning-derived algorithms to find the right targets for an ad and deliver exactly the business goals requested.What makes this process somewhat galling for the advertiser is that the more of a black box Meta’s advertising becomes the better the advertising results, even as Meta makes more margin. The big reason for the former is the App Tracking Transparency (ATT)-driven shift in digital advertising to probabilistic models in place of deterministic ones.It used to be that ads shown to users could be perfectly matched to conversions made in 3rd-party apps or on 3rd-party websites; Meta was better at this than everyone else, thanks to its scale and fully built-out ad infrastructure (including SDKs in apps and pixels on websites), but this was a type of targeting and conversion tracking that could be done in some fashion by other entities, whether that be smaller social networks like Snap, ad networks, or even sophisticated marketers themselves.ATT severed that link, and Meta’s business suffered greatly; from a February post-earnings Update:It is worth noting that while the digital ecosystem did not disappear, it absolutely did shrink: [MoffettNathanson’s Michael] Nathanson, in his Meta earnings note, explained what he was driving at with that question:While revenues have recovered, with +22% organic growth in the fourth quarter, we think that the more important driver of the outperformance has been the company’s focus on tighter cost controls. Coming in 2023, Meta CEO Mark Zuckerberg made a New Year’s resolution, declaring 2023 the “Year of Efficiency.” By remaining laser-focused on reining in expense growth as the top line reaccelerated, Meta’s operating margins (excluding restructuring) expanded almost +1,100 bps vs last 4Q, reaching nearly 44%. Harking back to Zuckerberg’s resolution, Meta’s 2023 was, in fact, highly efficient…Putting this in perspective, two years ago, after the warnings on the 4Q 2021 earnings call, we forecasted that Meta Family of Apps would generate $155 billion of revenues and nearly $68 billion of GAAP operating income in 2023. Fast forward to today, and last night Meta reported that Family of Apps delivered only $134.3 billion of revenues ($22 billion below our 2-year ago estimate), yet FOA operating income (adjusted for one-time expenses) was amazingly in-line with that two-year old forecast. For 2024, while we now forecast Family of Apps revenues of $151.2 billion (almost $30 billion below the forecast made on February 2, 2022), our current all-in Meta operating profit estimate of $56.8 billion is also essentially in line. In essence, Meta has emerged as a more profitable (dare we say, efficient) business.That shrunken revenue figure is digital advertising that simply disappeared — in many cases, along with the companies that bought it — in the wake of ATT. The fact that Meta responded by becoming so much leaner, though, was critical to not just surviving ATT, but also laid the groundwork for where the company is going next.Increased company efficiency is a reason to be bullish on Meta, but three years on, the key takeaway from ATT is that it validated my thesis that Meta is anti-fragile. From 2020’s Apple and Facebook:This is a very different picture from Facebook, where as of Q1 2019 the top 100 advertisers made up less than 20% of the company’s ad revenue; most of the $69.7 billion the company brought in last year came from its long tail of 8 million advertisers. This focus on the long-tail, which is only possible because of Facebook’s fully automated ad-buying system, has turned out to be a tremendous asset during the coronavirus slow-down…This explains why the news about large CPG companies boycotting Facebook is, from a financial perspective, simply not a big deal. Unilever’s $11.8 million in U.S. ad spend, to take one example, is replaced with the same automated efficiency that Facebook’s timeline ensures you never run out of content. Moreover, while Facebook loses some top-line revenue — in an auction-based system, less demand corresponds to lower prices — the companies that are the most likely to take advantage of those lower prices are those that would not exist without Facebook, like the direct-to-consumer companies trying to steal customers from massive conglomerates like Unilever.In this way Facebook has a degree of anti-fragility that even Google lacks: so much of its business comes from the long tail of Internet-native companies that are built around Facebook from first principles, that any disruption to traditional advertisers — like the coronavirus crisis or the current boycotts — actually serves to strengthen the Facebook ecosystem at the expense of the TV-centric ecosystem of which these CPG companies are a part.Make no mistake, a lot of these kinds of companies were killed by ATT; the ones that survived, though, emerged into a world where no one other than Meta — thanks in part to a massive GPU purchase the same month the company reached its most-recent stock market nadir — had the infrastructure to rebuild the type of ad system they depended on. This rebuild had to be probabilistic — making a best guess as to the right target, and, more confoundingly, a best guess as to conversion — which is only workable with an astronomical amount of data and an astronomical amount of infrastructure to process that data, such that advertisers could once again buy based on promised results, and have those promises met.Now into this cauldron Meta is adding generative AI. Advertisers have long understood the importance of giving platforms like Meta multiple pieces of creative for ads; Meta’s platform will test different pieces of creative with different audiences and quickly hone in on what works, putting more money behind the best arrow. Generative AI puts this process on steroids: advertisers can provide Meta with broad parameters and brand guidelines, and let the black box not just test out a few pieces of creative, but an effectively unlimited amount. Critically, this generative AI application has a verification function: did the generated ad generate more revenue or less? That feedback function, meanwhile, is data in its own right, and can be leveraged to better target individuals in the future.The second piece to all of this — the galling part I referenced above — is the margin question. The Department of Justice’s lawsuit against Google’s ad business explains why black boxes are so beneficial to big ad platforms:Over time, as Google’s monopoly over the publisher ad server was secured, Google surreptitiously manipulated its Google Ads’ bids to ensure it won more high-value ad inventory on Google’s ad exchange while maintaining its own profit margins by charging much higher fees on inventory that it expected to be less competitive. In doing so, Google was able to keep both categories of inventory out of the hands of rivals by competing in ways that rivals without similar dominant positions could not. In doing so, Google preserved its own profits across the ad tech stack, to the detriment of publishers. Once again, Google engaged in overt monopoly behavior by grabbing publisher revenue and keeping it for itself. Google called this plan “Project Bernanke.”I’m skeptical about the DOJ’s case for reasons I laid out in this Update; publishers made more money using Google’s ad server than they would have otherwise, while the advertisers, who paid more, are not locked in. The black box effect, however, is real: platforms like Google or Meta can meet an advertiser’s goals — at a price point determined by an open auction — without the advertisers knowing which ads worked and which ones didn’t, keeping the margin from the latter. The galling bit is that this works out best for everyone: these platforms are absolutely finding customers you wouldn’t get otherwise, which means advertisers earn more when the platforms earn more too, and these effects will only be supercharged with generative ads.There’s more upside for Meta, too. Google and Amazon will benefit from generative ads, but I expect the effect will be the most powerful at the top of the funnel where Meta’s advertising operates, as opposed to the bottom-of-the-funnel search ads where Amazon and Google make most of their money. Moreover, there is that long tail I mentioned above: one of the challenges for Meta in moving from text (Feed) to images (Stories) to video (Reels) is that effective creative becomes more difficult to execute, especially if you want multiple variations. Meta has devoted a lot of resources over the years to tooling to help advertisers make effective ads, much of which will be obviated by generative AI. This, by extension, will give long tail advertisers more access to more inventory, which will increase demand and ultimately increase prices.There is one more channel that is exclusive to Meta: text-to-message ads. These are ads where the conversion event is initiating a chat with an advertiser, an e-commerce channel that is particularly popular in Asia. The distinguishing factor in the markets where these ads are taking off is low labor costs, which AI addresses. Zuckerberg explained in a 2023 earnings call:And then the one that I think is going to have the fastest direct business loop is going to be around helping people interact with businesses. You can imagine a world on this where over time, every business has as an AI agent that basically people can message and interact with. And it’s going to take some time to get there, right? I mean, this is going to be a long road to build that out. But I think that, that’s going to improve a lot of the interactions that people have with businesses as well as if that does work, it should alleviate one of the biggest issues that we’re currently having around messaging monetization is that in order for a person to interact with a business, it’s quite human labor-intensive for a person to be on the other side of that interaction, which is one of the reasons why we’ve seen this take off in some countries where the cost of labor is relatively low. But you can imagine in a world where every business has an AI agent, that we can see the kind of success that we’re seeing in Thailand or Vietnam with business messaging could kind of spread everywhere. And I think that’s quite exciting.Both of these use cases — generative ads and click-to-message AI agents — are great examples as to why it makes sense for Meta to invest in its Llama models and make them open(ish): more and better AI means more and better creative and more and better agents, all of which can be monetized via advertising.Medium-Term: The Smiling Curve and Infinite ContentOf course all of this depends on people continuing to use Meta properties, and here AI plays an important role as well. First, there is the addition of Meta AI, which makes Meta’s apps more useful. Meta AI also opens the door to a search-like product, which The Information just reported the company was working on; potential search advertising is a part of the bull case as well, although for me a relatively speculative one.Second is the insertion of AI content into the Meta content experience, which Meta just announced it is working on. From The Verge:If you think avoiding AI-generated images is difficult as it is, Facebook and Instagram are now going to put them directly into your feeds. At the Meta Connect event on Wednesday, the company announced that it’s testing a new feature that creates AI-generated content for you “based on your interests or current trends” — including some that incorporate your face.When you come across an “Imagined for You” image in your feed, you’ll see options to share the image or generate a new picture in real time. One example (embedded below) shows several AI-generated images of “an enchanted realm, where magic fills the air.” But others could contain your face… which I’d imagine will be a bit creepy to stumble upon as you scroll…In a statement to The Verge, Meta spokesperson Amanda Felix says the platform will only generate AI images of your face if you “onboarded to Meta’s Imagine yourself feature, which includes adding photos to that feature” and accepting its terms. You’ll be able to remove AI images from your feed as well.This sounds like a company crossing the Rubicon, but in fact said crossing already happened a few years ago. Go back to 2015’s Facebook and the Feed, where I argued that Facebook was too hung up on being a social network, and concluded:Consider Facebook’s smartest acquisition, Instagram. The photo-sharing service is valuable because it is a network, but it initially got traction because of filters. Sometimes what gets you started is only a lever to what makes you valuable. What, though, lies beyond the network? That was Facebook’s starting point, and I think the answer to what lies beyond is clear: the entire online experience of over a billion people. Will Facebook seek to protect its network — and Zuckerberg’s vision — or make a play to be the television of mobile?It took Facebook another five years — and the competitive threat of TikTok — but the company finally did make the leap to showing you content from across the entire service, not just that which was posted by your network. The latter was an artificial limitation imposed by the company’s own self-conception of itself as a social network, when in reality it is a content network; true social networking — where you talk to people you actually know — happens in group chats:The structure of this illustration may look familiar; it’s another manifestation of The Smiling Curve, which I first wrote about in the context of publishing:Over time, as this cycle repeats itself and as people grow increasingly accustomed to getting most of their “news” from Facebook (or Google or Twitter), value moves to the ends, just like it did in the IT manufacturing industry or smartphone industry:On the right you have the content aggregators, names everyone is familiar with: Google ($369.7 billion), Facebook ($209.0 billion), Twitter ($26.4 billion), Pinterest (private). They are worth by far the most of anyone in this discussion. Traditional publishers, meanwhile, are stuck in the middle…publishers (all of them, not just newspapers) don’t really have an exclusive on anything anymore. They are Acer, offering the same PC as the next guy, and watching as the lion’s share of the value goes to the folks who are actually putting the content in front of readers.It speaks to the inevitability of the smiling curve that it has even come for Facebook (which I wrote about in 2020’s Social Networking 2.0); moving to global content and purely individualized feeds unconstrained by your network was the aforementioned Rubicon crossing. The provenance of that content is a tactical question, not a strategic one.To that end, I’ve heard whispers that these AI content tests are going extremely well, which raises an interesting financial question. One of Meta’s great strengths is that it gets its content for free from users. There certainly are costs incurred in personalizing your feed, but this is one of the rare cases where AI content is actually more expensive. It’s possible, though, that it simply is that much better and more engaging, in part because it is perfectly customized to you.This leads to a third medium-term AI-derived benefit that Meta will enjoy: at some point ads will be indistinguishable from content. You can already see the outlines of that given I’ve discussed both generative ads and generative content; they’re the same thing! That image that is personalized to you just might happen to include a sweater or a belt that Meta knows you probably want; simply click-to-buy.It’s not just generative content, though: AI can figure out what is in other content, including authentic photos and videos. Suddenly every item in that influencer photo can be labeled and linked — provided the supplier bought into the black box, of course — making not just every piece of generative AI a potential ad, but every piece of content period.The market implications of this are profound. One of the oddities of analyzing digital ad platforms is that some of the most important indicators are counterintuitive; I wrote this spring:The most optimistic time for Meta’s advertising business is, counter-intuitively, when the price-per-ad is dropping, because that means that impressions are increasing. This means that Meta is creating new long-term revenue opportunities, even as its ads become cost competitive with more of its competitors; it’s also notable that this is the point when previous investor freak-outs have happened.When I wrote that I was, as I noted in the introduction, feeling more cautious about Meta’s business, given that Reels is built out and the inventory opportunities of Meta AI were not immediately obvious. I realize now, though, that I was distracted by Meta AI: the real impact of AI is to make everything inventory, which is to say that the price-per-ad on Meta will approach $0 for basically forever. Would-be competitors are finding it difficult enough to compete with Meta’s userbase and resources in a probabilisitic world; to do so with basically zero price umbrella seems all-but-impossible.The Long-term: XR and Generative UINotice that I am thousands of words into this Article and, like Meta Myths, haven’t even mentioned VR or AR. Meta’s AI-driven upside is independent from XR becoming the platform of the future. What is different now, though, is that the likelihood of XR mattering feels dramatically higher than it did even six months ago.The first one is obviously Orion, which I wrote about last month. Augmented reality is definitely going to be a thing — I would buy a pair of Meta’s prototypes now if they were for sale.Once again, however, the real enabler will be AI. In the smartphone era, user interfaces started out being pixel perfect, and have gradually evolved into being declarative interfaces that scale to different device sizes. AI, however, will enable generative UI, where you are only presented with the appropriate UI to accomplish the specific task at hand. This will be somewhat useful on phones, and much more compelling on something like a smartwatch; instead of having to craft an interface for a tiny screen, generative UIs will surface exactly what you need when you need it, and nothing else.Where this will really make a difference is with hardware like Orion. Smartphone UI’s will be clunky and annoying in augmented reality; the magic isn’t in being pixel perfect, but rather being able to do something with zero friction. Generative UI will make this possible: you’ll only see what you need to see, and be able to interact with it via neural interfaces like the Orion neural wristband. Oh, and this applies to ads as well: everything in the world will be potential inventory.AI will have a similarly transformative effect on VR, which I wrote about back in 2022 in DALL-E, the Metaverse, and Zero Marginal Content. That article traced the evolution of both games and user-generated content from text to images to video to 3D; the issue is that games had hit a wall, given the cost of producing compelling 3D content, and that that challenge would only be magnified by the immersive nature of VR. Generative AI, though, will solve that problem:In the very long run this points to a metaverse vision that is much less deterministic than your typical video game, yet much richer than what is generated on social media. Imagine environments that are not drawn by artists but rather created by AI: this not only increases the possibilities, but crucially, decreases the costs.Here once again Meta’s advantages come to the fore: not only are they leading the way in VR with the Quest line of headsets, but they are also justified in building out the infrastructure necessary to generate metaverses — advertising included — because every part of their business benefits from AI.From Abundance to InfinityThis was all a lot of words to explain the various permutations of an obvious truth: a world of content abundance is going to benefit the biggest content Aggregator first and foremost. Of course Meta needs to execute on all of these vectors, but that is where they also benefit from being founder-led, particularly given the fact that founder seems more determined and locked in than ever.It’s also going to cost a lot of money, both in terms of training and inference. The inference part is inescapable: Meta may have a materially higher cost of revenue in the long run. The training part, however, has some intriguing possibilities. Specifically, Meta’s AI opportunities are so large and so central to the company’s future, that there is no question that Zuckerberg will spend whatever is necessary to keep pushing Llama forward. Other companies, however, with less obvious use cases, or more dependency on third-party development that may take longer than expected to generate real revenue, may at some point start to question their infrastructure spend, and wonder if it might make more sense to simply license Llama (this is where the “ish” part of “openish” looms large). It’s definitely plausible that Meta ends up being subsidized for building the models that give the company so much upside.Regardless, it’s good to be back on the Meta bull train, no matter what tomorrow’s earnings say about last quarter or next year. Stratechery from the beginning has been focused on the implications of abundance and the companies able to navigate it on behalf of massive user bases — the Aggregators. AI takes abundance to infinity, and Meta is the purest play of all.I wrote a follow-up to this Article in this Daily Update.https://www.youtube.com/embed/tYVWzBIVVN4?si=J_8a0p9WyF5RDQdp&controls=0
Elon Dreams and Bitter LessonsTuesday, October 15, 2024Listen to PodcastWatch on YouTubeListen to this post:Log in to listenIn the days after SpaceX’s awe-inspiring Starship launch-and-catch — watch the first eight minutes of this video if you haven’t yet — there was another older video floating around on X, this time of Richard Bowles, a former executive at Arianespace, the European rocket company. The event was the Singapore Satellite Industry Forum, and the year was 2013:https://videopress.com/embed/DYF1wrn8?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataThis morning, SpaceX came along and said, “We foresee a launch costing $7 million”. Well, ok, let’s ignore the 7, let’s say $15 million…at $15 million every operator would change their gameplane completely. Every supplier would change their gameplan completely. We wouldn’t be building satellites exactly as we are today, so a lot of these questions I think it might be interesting to go on that and say, “Where do you see your companies if you’re going to compete with a $15 million launch program.” So Richard, where do you see your company competing with a $15 million launch?”…RB: SpaceX is an interesting phenomenon. We saw it, and you just mentioned it, I thought it was $5 million or $7 million…Why don’t you take Arianespace instead of SpaceX first. Where would you compete with a $15 million launch?RB: I’ve got to talk about what I’m competing with, because that then predicates exactly how we will compete when we analyze what we are competing with. Obviously we like to analyze the competition.So today, SpaceX hasn’t launched into the geosynchronous orbit yet, they’re doing very well, their progress is going forward amazingly well, but I’m discovering in the market is that SpaceX primarily seems to be selling a dream, which is good, we should all dream, but I think a $5 million launch, or a $15 million launch, is a bit of a dream. Personally I think reusability is a dream. Recently I was at a session where I was told that there was no recovery plan because they’re not going to have any failures, so I think that’s a part of the dream.So at the moment, I feel that we’re looking, and you’re presenting to me, how am I going to respond to a dream? My answer to respond to a dream is that first of all, you don’t wake people up, they have to wake up on their own, and then once the market has woken up to the dream and the reality, then we’ll compete with that.But they are looking at a price which is about half yours today.RB: It’s a dream.Alright. Suppose that you wake up and they’re there, what would you Arianespace do.RB: We would have to react to it. They’re not supermen, so whatever they can do we can do. We would then have to follow. But today, at the moment…it is a theoretical question at this moment in time.I personally don’t believe it’s going to be theoretical for that much longer. They’ve done everything almost they said they would do. That’s true.The moderator ended up winning the day; in 2020 Elon Musk said on a podcast that the “best case” for Falcon 9 launches was indeed $15 million (i.e. most cost more, but that price point had been achieved). Of course customers pay a lot more: SpaceX charges a retail price of $67 million per launch, in part because it has no competition; Arianespace retired the Ariane 5 rocket, which had a retail launch price of $178 million, in 2023. Ariane 6 had its first launch this year, but it’s not price competitive, in part because it’s not reusable. From Politico:The idea of copying SpaceX and making Ariane partly reusable was considered and rejected. That decision haunts France’s Economy Minister Bruno Le Maire. “In 2014 there was a fork in the road, and we didn’t take the right path,” Le Maire said in 2020.But just because it works for Elon, doesn’t make it good for Europe. Once it’s up and running, Ariane 6 should have nine launches a year — of which around four will be for institutional missions, like government reconnaissance satellites and earth observation systems. The rest will be targeted at commercial clients.Compare that to SpaceX. Fed by a steady stream of Pentagon and industry contracts, in addition to missions for its own Starlink satellite constellation, Musk’s company carried out a record 96 launches in 2023.“It wasn’t that we just said reusability is bullshit,” said [former head the European Space Agency Jan] Wörner of the early talks around Ariane 6 in the mid-2010s, and the consideration of building reusable stages rather than burning through fresh components each mission. “If you have 10 flights per year and you are only building one new launcher per year then from an industrial point of view that’s not going to work.”Wörner’s statement is like Bowles in the way in which it sees the world as static; Bowles couldn’t see ahead to a world where SpaceX actually figured out how to reuse rockets by landing them on drone ships, much less the version 2 example of catching a much larger rocket that we saw this weekend. Wörner, meanwhile, can’t see backwards: the reason why SpaceX has so much more volume, both from external customers and from itself (Starlink), is because it is cheap. Cheapness creates scale, which makes things even cheaper, and the ultimate output is entirely new markets.The SpaceX DreamOf course Bowles was right in another way: SpaceX is a dream. It’s a dream of going to Mars, and beyond, of extending humanity’s reach beyond our home planet; Arianespace is just a business. That, though, has been their undoing. A business carefully evaluates options, and doesn’t necessarily choose the highest upside one, but rather the one with the largest expected value, a calculation that incorporates the likelihood of success — and even then most find it prudent to hedge, or build in option value.A dreamer, though, starts with success, and works backwards. In this case, Musk explained the motivation for driving down launch costs on X:First off, this made it imperative that SpaceX find a way to launch a massively larger rocket that is fully recoverable, and doesn’t include the weight and logistics costs of the previous approach (this weekend SpaceX caught the Super Heavy booster; the next step is catching the Starship spacecraft that sits above it). Once SpaceX can launch massively larger rockets cheaply, though, it can start to do other things, like dramatically expand Starlink capability.The next generation Starlink satellites, which are so big that only Starship can launch them, will allow for a 10X increase in bandwidth and, with the reduced altitude, faster latency https://t.co/HLYdjjia3o— Elon Musk (@elonmusk) October 14, 2024Starlink won’t be the only beneficiary; the Singapore moderator had it right back in 2013: everyone will change their gameplan completely, which will mean more business for SpaceX, which will only make things cheaper, which will mean even more business. Indeed, there is a window to rocketports that don’t have anything to do with Mars, but simply facilitate drastically faster transportation here on planet earth. The transformative possibilities of scale — and the dramatic decrease in price that follows — are both real and hard to imagine.Tesla’s Robotaxi PresentationThe Starship triumph wasn’t the only Musk-related story of the week: last Thursday Tesla held its We, Robot event where it promised to unveil its Robotaxi, and observers were considerably less impressed. From Bloomberg:Elon Musk unveiled Tesla Inc.’s highly anticipated self-driving taxi at a flashy event that was light on specifics, sending its stock sliding as investors questioned how the carmaker will achieve its ambitious goals. The chief executive officer showed off prototypes of a slick two-door sedan called the Cybercab late Thursday, along with a van concept and an updated version of Tesla’s humanoid robot. The robotaxi — which has no steering wheel or pedals — could cost less than $30,000 and “probably” will go into production in 2026, Musk said.The product launch, held on a movie studio lot near Los Angeles, didn’t address how Tesla will make the leap from selling advanced driver-assistance features to fully autonomous vehicles. Musk’s presentation lacked technical details and glossed over topics including regulation or whether the company will own and operate its own fleet of Cybercabs. As Jefferies analysts put it, Tesla’s robotaxi appears “toothless.”The underwhelming event sent Tesla’s shares tumbling as much as 10% Friday in New York, the biggest intraday decline in more than two months. They were down 7.6% at 12:29 p.m., wiping out $58 billion in market value. The stock had soared almost 70% since mid-April, largely in anticipation of the event. Uber Technologies Inc. and Lyft Inc., competing ride-hailing companies whose investors had been nervously awaiting the Cybercab’s debut, each surged as much as 11% Friday. Uber’s stock hit an all-time high.Tesla has a track record of blowing past timelines Musk has offered for all manner of future products, and has had a particularly difficult time following through on his self-driving forecasts. The CEO told investors in 2019 that Tesla would have more than 1 million robotaxis on the road by the following year. The company hasn’t deployed a single autonomous vehicle in the years since.First off, the shockingly short presentation — 22:44 from start to “Let’s get the party started” — was indeed devoid of any details about the Robotaxi business case. Secondly, all of the criticisms of Musk’s mistaken predictions about self-driving are absolutely true. Moreover, the fact of the matter is that Tesla is now far behind the current state-of-the-art, Waymo, which is in operation in four U.S. cities and about to start up in two more. Waymo has achieved Level 4 automation, while Tesla’s are stuck at Level 2. To review the levels of automation:
- Level 0: Limited features that provide warnings and momentary assistance (i.e. automatic emergency braking)
- Level 1: Steering or brake/acceleration automation (i.e. cruise control or lane centering)
- Level 2: Steering and brake/acceleration control, which must be constantly supervised (i.e. hands-on-wheel)
- Level 3: Self-driving that only operates under pre-defined conditions, and in which the driver must take control immediately when requested
- Level 4: Self-driving that only operates under pre-defined conditions, under which the driver is not expected to take control
- Level 5: Self-driving under all conditions, with no expectation of driver control
Waymo has two big advantages relative to Tesla: first, its cars have a dramatically more expansive sensor suite, including camera, radar, and LiDAR; the latter is the most accurate way to measure depth, which is particularly tricky for cameras and fairly imprecise for radar. Second, any Waymo car can be taken over by a remote driver any time it encounters a problem. This doesn’t happen often — once every 17,311 miles in sunny California last year — but it is comforting to know that there is a fallback.The challenge is that both of these advantages cost money: LiDAR is the biggest reason why the Generation 5 Waymo’s on the streets of San Francisco cost a reported $200,000; Generation 6 has fewer sensors and should be considerably cheaper, and prices will come down as Waymo scales, but this is still a barrier. Humans in data centers, meanwhile, sitting poised to take over a car that encounters trouble, are not just a cost center but also a limit on scalability. Then again, higher cost structures are its own limitation on scalability; Waymos are awesome but they will need to get an order of magnitude cheaper to change the world.The Autonomy DreamWhat was notable about Musk’s Tesla presentation is what it actually did include. Start with that last point; Musk’s focus was on that changing the world bit:https://videopress.com/embed/Ns0M3bek?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataYou see a lot of sci-fi movies where the future is dark and dismal. It’s not a future you want to be in. I love Bladerunner, but I don’t know if we want that future. I think we want that duster he’s wearing, but not the bleak apocalypse. We want to have a fun, exciting future that if you could look in a crystal ball and see the future, you’d be like “Yes, I wish that I could be there now”. That’s what we want.Musk proceeded to talk about having a lounge on wheels that gave you your time back and was safer to boot, and which didn’t need ugly parking lots; the keynote slides added parks to LAX and Sofi and Dodger Stadiums:https://videopress.com/embed/DgjXzBQz?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataOne of the things that is really interesting is how will this affect the cities that we live in. When you drive around a city, or the car drive you around the city, you see that there’s a lot of parking lots. There’s parking lots everywhere. There are parking garages. So what would happen if you have an autonomous world is that you can now turn parking lots into parks…there’s a lot of opportunity to create greenspace in the cities that we live in.This is certainly an attractive vision; it’s also far beyond the world of Uber and Lyft or even Waymo, which are focused on solving the world as it actually exists today. That means dealing with human drivers, which means there will be parking lots for a long time to come. Musk’s vision is a dream.What, though, would that dream require, if it were to come true? Musk said it himself: full autonomy provided by a fleet of low cost vehicles that make it silly — or prohibitively expensive, thanks to sky-rocketing insurance — for anyone to drive themselves. That isn’t Level 4, like Waymo, it’s Level 5, and, just as importantly, it’s cheap, because cheap drives scale and scale drives change.Tesla’s strategy for “cheap” is well-known: the company eschews LiDAR, and removed radar from new models a few years ago, claiming that it would accomplish its goals using cameras alone. Setting aside the viability of this claim, the connection to the the dream is clear: a cameras-only approach enables the low cost vehicles integral to Musk’s dream. Yes, Waymo equipment costs will come down with scale, but Waymo’s current approach is both safer in the present and also more limited in bringing about the future.What many folks seemed to miss in Musk’s presentation was his explanation as to how Tesla — and only Tesla — might get there.The Bitter LessonRich Sutton wrote one of the most important and provocative articles about AI in 2019; it’s called The Bitter Lesson:The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore’s law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers’ belated learning of this bitter lesson, and it is instructive to review some of the most prominent.The examples Sutton goes over includes chess, where search beat deterministic programming, and Go, where unsupervised learning did the same. In both cases bringing massive amounts of compute to bear was both simpler and more effective than humans trying to encode their own shortcuts and heuristics. The same thing happened with speech recognition and computer vision: deep learning massively outperforms any sort of human-guided algorithms. Sutton notes towards the end:One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.It’s a brilliant observation, to which I might humbly add one additional component: while the Bitter Lesson is predicated on there being an ever-increasing amount of compute, which reliably solves once-intractable problems, one of the lessons of LLMs is that you also need an ever-increasing amount of data. Existing models are already trained on all of the data AI labs can get their hands on, including most of the Internet, YouTube transcripts, scanned books, etc.; there is much talk about creating synthetic data, both from humans and from other LLMs, to ensure that scaling laws continue. The alternative is that we hit the so-called “data wall”.LLMs, meanwhile, are commonly thought about in terms of language — it is in the name, after all — but what they actually predict are tokens, and tokens can be anything, including driving data. Timothy Lee explained some of Waymo’s research in this area at Understanding AI:Any self-driving system needs an ability to predict the actions of other vehicles. For example, consider this driving scene I borrowed from a Waymo research paper:Vehicle A wants to turn left, but it needs to do it without running into cars B or D. There are a number of plausible ways for this scene to unfold. Maybe B will slow down and let A turn. Maybe B will proceed, D will slow down, and A will squeeze in between them. Maybe A will wait for both vehicles to pass before making the turn. A’s actions depend on what B and D do, and C’s actions, in turn, depend on what A does.If you are driving any of these four vehicles, you need to be able to predict where the other vehicles are likely to be one, two, and three seconds from now. Doing this is the job of the prediction module of a self-driving stack. Its goal is to output a series of predictions that look like this:Researchers at Waymo and elsewhere struggled to model interactions like this in a realistic way. It’s not just that each individual vehicle is affected by a complex set of factors that are difficult to translate into computer code. Each vehicle’s actions depend on the actions of other vehicles. So as the number of cars increases, the computational complexity of the problem grows exponentially.But then Waymo discovered that transformer-based networks were a good way to solve this kind of problem.“In driving scenarios, road users may be likened to participants in a constant dialogue, continuously exchanging a dynamic series of actions and reactions mirroring the fluidity of communication,” Waymo researchers wrote in a 2023 research paper.Just as a language model outputs a series of tokens representing text, Waymo’s vehicle prediction model outputs a series of tokens representing vehicle trajectories—things like “maintain speed and direction,” “turn 5 degrees left,” or “slow down by 3 mph”.Rather than trying to explicitly formulate a series of rules for vehicles to follow (like “stay in your lane” and “don’t hit other vehicles”), Waymo trained the model like an LLM. The model learned the rules of driving by trying to predict the trajectories of human-driven vehicles on real roads.This data-driven approach allowed the model to learn subtleties of vehicle interactions that are not described in any driver manual and would be hard to capture with explicit computer code.This is not yet a panacea. Lee notes later in his article:One big problem Sinavski noted is that Wayve hasn’t found a vision-language model that’s “really good at spatial reasoning.” If you’re a long-time reader of Understanding AI, you might remember when I asked leading LLMs to tell the time from an analog clock or solve a maze. ChatGPT, Claude, and Gemini all failed because today’s foundation models are not good at thinking geometrically.This seems like it would be a big downside for a model that’s supposed to drive a car. And I suspect it’s why Waymo’s perception system isn’t just one big network. Waymo still uses traditional computer code to divide the driving scene up into discrete objects and compute a numerical bounding box for each one. This kind of pre-processing gives the prediction network a head start as it reasons about what will happen next.Another concern is that the opaque internals of LLMs make them difficult to debug. If a self-driving system makes a mistake, engineers want to be able to look under the hood and figure out what happened. That’s much easier to do in a system like Waymo’s, where some of the basic data structures (like the list of scene elements and their bounding boxes) were designed by human engineers.But the broader point here is that self-driving companies do not face a binary choice between hand-crafted code or one big end-to-end network. The optimal self-driving architecture is likely to be a mix of different approaches. Companies will need to learn the best division of labor from trial and error.That sounds right, but for one thing: The Bitter Lesson. To go back to Sutton:This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.If The Bitter Lesson ends up applying to true Level 5 autonomy, then Waymo is already signed up for school. A “mix of different approaches” clearly works better now, and may for the next few years, but does it get them to Level 5? And what of the data, to the extent it is essential to the The Sweet Outcome of self-taught AI? This was the part of the Tesla presentation I referenced above:https://videopress.com/embed/kVbQP6tY?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataOne of the reasons why the computer can be so much better than a person is that we have millions of cars that are training on driving. It’s like living millions of lives simultaneously and seeing very unusal situations that a person in their entire lifetime would not see, hopefully. With that amount of training data, it’s obviously going to be much better than what a human could be, because you can’t live a million lives. It can also see in all directions simultaneously, and it doesn’t get tired or text or any of those things, so it will naturally be 10x, 20x, 30x safer than a human for all those reasons.I want to emphasize that the solution that we have is AI and vision. There’s no expensive equipment needed. The Model 3 and Model Y and S and X that we make today will be capable of full autonomy unsupervised. And that means that our costs of producing the vehicle is low.Again, Musk has been over-promising and under-delivering in terms of self-driving for existing Tesla owners for years now, so the jury is very much out on whether current cars get full unsupervised autonomy. But that doesn’t change the fact that those cars do have cameras, and those cameras are capturing data and doing fine-tuning right now, at a scale that Waymo has no way of matching. This is what I think Andrej Karpathy, the former Tesla Autopilot head, was referring to in his recent appearance on the No Priors podcast:https://videopress.com/embed/lRzylm8a?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataI think people think that Waymo is ahead of Tesla, I think personally Tesla is ahead of Waymo, and I know it doesn’t look like that, but I’m still very bullish on Tesla and it’s self-driving program. I think that Tesla has a software problem, and I think Waymo has a hardware problem, is the way I put it, and I think software problems are much easier. Tesla has deployment of all these cars on earth at scale, and I think Waymo needs to get there. The moment Tesla gets to the point where they can actually deploy and it actually works I think is going to be really incredible…I’m not sure that people are appreciating that Tesla actually does use a lot of expensive sensors, they just do it at training time. So there a bunch of cars that drive around with LiDARS, they do a bunch of stuff that doesn’t scale and they have extra sensors etc., and they do mapping and all this stuff. You’re doing it at training time and then you’re distilling that into a test-time package that is deployed to the cars and is vision only. It’s like an arbitrage on sensors and expense. And so I think it’s actually kind of a brilliant strategy that I don’t think is fully appreciated, and I think is going to work out well, because the pixels have the information, and I think the network will be capable of doing that, and yes at training time I think these sensors are really useful, but I don’t think they are as useful at test time.Do note that Karpathy — who worked at Tesla for five years — is hardly a neutral observer, and also note that he forecasts a fully neural net approach to driving as taking ten years; that’s hardly next year, as Musk promised. That end goal, though, is Level 5, with low cost sensors and thus low cost cars, the key ingredient of realizing the dream of full autonomy and the transformation that would follow.The Cost of DreamsI don’t, for the record, know if the Tesla approach is going to work; my experience with both Waymo and Tesla definitely makes clear that Waymo is ahead right now (and the disengagement numbers for Tesla are multiple orders of magnitude worse). Most experts assume that LiDAR sensors are non-negotiable in particular.The Tesla bet, though, is that Waymo’s approach ultimately doesn’t scale and isn’t generalizable to true Level 5, while starting with the dream — true autonomy — leads Tesla down a better path of relying on nothing but AI, fueled by data and fine-tuning that you can only do if you already have millions of cars on the road. That is the connection to SpaceX and what happened this weekend: if you start with the dream, then understand the cost structure necessary to achieve that dream, you force yourself down the only path possible, forgoing easier solutions that don’t scale for fantastical ones that do.https://www.youtube.com/embed/ohYofmq8jJk?si=nCmOisjgsC_GF8Kt&controls=0
Enterprise Philosophy and The First Wave of AITuesday, September 24, 2024Listen to Podcast Watch on YouTubeListen to this post:Log in to listenThe popular history of technology usually starts with the personal computer, and for good reason: that was the first high tech device that most people ever used. The only thing more impressive than the sheer audacity of “A computer on every desk and in every home” as a corporate goal, is the fact that Microsoft accomplished it, with help from its longtime competitor Apple.In fact, though, the personal computer wave was the 2nd wave of technology, particularly in terms of large enterprises. The first wave — and arguably the more important wave, in terms of economic impact — was the digitization of back-end offices. These were real jobs that existed:These are bookkeepers and and tellers at a bank in 1908; fast forward three decades and technology had advanced:The caption for this 1936 Getty Images photo is fascinating:The new system of maintaining checking accounts in the National Safety Bank and Trust Company, of New York, known as the “Checkmaster,” was so well received that the bank has had to increase its staff and equipment. Instead of maintaining a mininum balance, the depositor is charged a small service charge for each entry on his statement. To date, the bank has attracted over 30,000 active accounts. Bookkeepers are shown as they post entries on “Checkmaster” accounts.It’s striking how the first response to a process change is to come up with a business model predicated on covering new marginal costs; only later do companies tend to consider the larger picture, like how low-marginal-cost checking accounts might lead to more business for the bank overall, the volume of which can be supported thanks to said new technology.Jump ahead another three decades and the back office of a bank looked like this:Now the image is color and all of the workers are women, but what is perhaps surprising is that this image is, despite all of the technological change that had happened to date, particularly in terms of typewriters and calculators, not that different from the first two.However, this 1970 picture was one of the last images of its kind: by the time this picture was taken, Bank of America, where this picture was taken, was already well on its way to transitioning all of its accounting and bookkeeping to computers; most of corporate America soon followed, with the two primary applications being accounting and enterprise resource planning. Those were jobs that had been primarily done by hand; now they were done by computers, and the hands were no longer needed.Tech’s Two PhilosophiesIn 2018 I described Tech’s Two Philosophies using the four biggest consumer tech companies: Google and Facebook were on one side, and Apple and Microsoft on the other.In Google’s view, computers help you get things done — and save you time — by doing things for you. Duplex was the most impressive example — a computer talking on the phone for you — but the general concept applied to many of Google’s other demonstrations, particularly those predicated on AI: Google Photos will not only sort and tag your photos, but now propose specific edits; Google News will find your news for you, and Maps will find you new restaurants and shops in your neighborhood. And, appropriately enough, the keynote closed with a presentation from Waymo, which will drive you…Zuckerberg, as so often seems to be the case with Facebook, comes across as a somewhat more fervent and definitely more creepy version of Google: not only does Facebook want to do things for you, it wants to do things its chief executive explicitly says would not be done otherwise. The Messianic fervor that seems to have overtaken Zuckerberg in the last year, though, simply means that Facebook has adopted a more extreme version of the same philosophy that guides Google: computers doing things for people.Google and Facebook’s approach made sense given their position as Aggregators, which “attract end users by virtue of their inherent usefulness.” Apple and Microsoft, on the other hand, were platforms born in an earlier age of computing:This is technology’s second philosophy, and it is orthogonal to the other: the expectation is not that the computer does your work for you, but rather that the computer enables you to do your work better and more efficiently. And, with this philosophy, comes a different take on responsibility. Pichai, in the opening of Google’s keynote, acknowledged that “we feel a deep sense of responsibility to get this right”, but inherent in that statement is the centrality of Google generally and the direct culpability of its managers. Nadella, on the other hand, insists that responsibility lies with the tech industry collectively, and all of us who seek to leverage it individually…This second philosophy, that computers are an aid to humans, not their replacement, is the older of the two; its greatest proponent — prophet, if you will — was Microsoft’s greatest rival, and his analogy of choice was, coincidentally enough, about transportation as well. Not a car, but a bicycle:https://videopress.com/embed/lydOPhqW?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataYou can see the outlines of this philosophy in these companies’ approaches to AI. Google is the most advanced, thanks to the way it saw the obvious application of AI to its Aggregation products, particularly search and advertising. Facebook, now Meta, has made major strides over the last few years as it has overhauled its recommendation algorithms and advertising products to also be probabilistic, both in response to the rise of TikTok in terms of customer attention, and the severing of their deterministic ad product by Apple’s App Tracking Transparency initiative. In both cases their position as Aggregators compelled them to unilaterally go out and give people stuff to look at.Apple, meanwhile, is leaning heavily into Apple Intelligence, but I think there is a reason its latest ad campaign feels a bit weird, above-and-beyond the fact it is advertising a feature that is not yet available to non-beta customers. Apple is associated with jewel-like devices that you hold in your hand and software that is accessible to normal people; asking your phone to rescue a fish funeral with a slideshow feels at odds with Steve Jobs making a movie on stage during the launch of iMovie:https://videopress.com/embed/VoaUOT90?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataThat right there is a man riding a bike.Microsoft CopilotMicrosoft is meanwhile — to the extent you count their increasingly fraught partnership with OpenAI — in the lead technically as well. Their initial product focus for AI, however, is decidedly on the side of being a tool to, as their latest motto states, “empower every person and every organization on the planet to achieve more.”CEO Satya Nadella said in the recent pre-recorded keynote announcing Copilot Wave 2:https://videopress.com/embed/i6oYpXss?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataYou can think of Copilot as the UI for AI. It helps you break down these siloes between your work artifacts, your communications, and your business processes. And we’re just getting started. In fact, with scaling laws, as AI becomes more capable and even agentic, the models themselves become more of a commodity, and all the value gets created by you how steer, ground, fine-tune these models with your business data and workflow. And how it composes with the UI layer of human to AI to human interaction becomes critical.Today we’re announcing Wave 2 of Microsoft 365 Copilot. You’ll see us evolve Copilot in three major ways: first, it’s about bringing the web plus work plus pages together as the new AI system for knowledge work. With Pages we’ll show you how Copilot can take any information from the web or your work and turn it into a multiplayer AI-powered canvas. You can ideate with AI and collaborate with other people. It’s just magical. Just like the PC birthed office productivity tools as we know them today, and the web makes those canvases collaborative, every platform shift has changed work artifacts in fundamental ways, and Pages is the first new artifact for the AI age.Notice that Nadella, like most pop historians (including yours truly!), is reaching out to draw a link to the personal computer, but here the relevant personal computer history casts more shadow than light onto Nadella’s analogy. The initial wave of personal computers were little more than toys, including the Commodore 64 and TRS-80, sold in your local Radio Shack; the Apple I, released in 1976, was initially sold as a bare circuit board:A year later Apple released the Apple II; now there was a case, but you needed to bring your own TV:Two years later and Apple II had a killer app that would presage the movement of personal computers into the workplace: VisiCalc, the first spreadsheet.VisiCalc’s utility for business was obvious — in fact, it was conceived of by Dan Bricklin while watching a lecture at Harvard Business School. That utility, though, was not about running business critical software like accounting or ERP systems; rather, an employee with an Apple II and VisiCalc could take the initiative to model their business and understand how things worked at a view grounded in a level of calculation that was both too much for one person, yet not sufficient to hire an army of backroom employees, or, increasingly at that point, reserve time on the mainframe.Notice, though, how this aligned with the Apple and Microsoft philosophy of building tools: tools are meant to be used, but they take volition to maximize their utility. This, I think, is a challenge when it comes to Copilot usage: even before Copilot came out employees with initiative were figuring out how to use other AI tools to do their work more effectively. The idea of Copilot is that you can have an even better AI tool — thanks to the fact it has integrated the information in the “Microsoft Graph” — and make it widely available to your workforce to make that workforce more productive.To put it another way, the real challenge for Copilot is that it is a change management problem: it’s one thing to charge $30/month on a per-seat basis to make an amazing new way to work available to all of your employees; it’s another thing entirely — a much more difficult thing — to get all of your employees to change the way they work in order to benefit from your investment, and to make Copilot Pages the “new artifact for the AI age”, in line with the spreadsheet in the personal computer age.Clippy and CopilotSalesforce CEO Marc Benioff was considerably less generous towards Copilot in last week’s Dreamforce keynote. After framing machine learning as “Wave 1” of AI, Benioff said that Copilots were Wave 2, and from Microsoft’s perspective it went downhill from there:https://videopress.com/embed/xK5y1Fw4?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataWe moved into this Copilot world, but the Copilot world has been kind of a hit-and-miss world. The Copilot world where customers have said to us “Hey, I got these Copilots but they’re not exactly performing as we want them to. We don’t see how that Copilot world is going to get us to the real vision of artificial intelligence of augmentation of productivity, of better business results that we’ve been looking for. We just don’t see Copilot as that key step for our future.” In some ways, they kind of looked at Copilot as the new Microsoft Clippy, and I get that.The Clippy comparison was mean but not entirely unfair, particularly in the context of users who don’t know enough to operate with volition. Former Microsoft executive Steven Sinofsky explained in Hard Core Software:Why was Microsoft going through all this and making these risky, or even edgy, products? Many seemed puzzled by this at the time. In order to understand that today, one must recognize that using a PC in the early 1990s (and before) was not just difficult, but it was also confusing, frustrating, inscrutable, and by and large entirely inaccessible to most everyone unless you had to learn how to use one for work.Clippy was to be a replacement for the “Office guru” people consulted when they wanted to do things in Microsoft Office that they knew were possible, but were impossible to discover; Sinofsky admits that a critical error was making Clippy too helpful with simple tasks, like observing “Looks like you’re trying to write a letter” when you typed “Dear John” and hit return. Sinofsky reflected:The journey of Clippy (in spite of our best efforts that was what the feature came to be called) was one that parallels the PC for me in so many ways. It was not simply a failed feature, or that back-handed compliment of a feature that was simply too early like so many Microsoft features. Rather Clippy represented a final attempt at trying to fix the desktop metaphor for typical or normal people so they could use a computer.What everyone came to realize was that the PC was a generational change and that for those growing up with a PC, it was just another arbitrary and random device in life that one just used. As we would learn, kids didn’t need different software. They just needed access to a PC. Once they had a PC they would make cooler, faster, and more fun documents with Office than we were. It was kids that loved WordArt and the new graphics in Word and PowerPoint, and they used them easily and more frequently than Boomers or Gen X trying to map typewriters to what a computer could do.It was not the complexity that was slowing people down, but the real concern that the wrong thing could undo hours of work. Kids did not have that fear (yet). We needed to worry less about dumbing the software down and more about how more complex things could get done in a way that had far less risk.This is a critical insight when it comes to AI, Copilot, and the concept of change management: a small subset of Gen Xers and Boomers may have invented the personal computer, but for the rest of their cohort it was something they only used if they had to (resentfully), and only then the narrow set of functionality that was required to do their job. It was the later generations that grew up with the personal computer, and hardly give inserting a table or graphic into a document a second thought (if, in fact, they even know what a “document” is). For a millenial using a personal computer doesn’t take volition; it’s just a fact of life.Again, though, computing didn’t start with the personal computer, but rather with the replacement of the back office. Or, to put it in rather more dire terms, the initial value in computing wasn’t created by helping Boomers do their job more efficiently, but rather by replacing entire swathes of them completely.Agents and o1Benioff implicitly agrees; the Copilot Clippy insult was a preamble to a discussion of agents:https://videopress.com/embed/BnLxMGnH?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataBut it was pushing us, and they were trying to say, what is the next step? And we are now really at that moment. That is why this show is our most important Dreamforce ever. There’s no question this is the most exciting Dreamforce and the most important Dreamforce. What you’re going to see at this show is technology like you have never seen before…The first time you build and deploy your first autonomous agent for your company that is going to help you to be more productive, to augment your employees, and to get these better business results, you’re going to remember that like the first time it was in y our Waymo. This is the 3rd wave of AI. It’s agents…Agents aren’t copilots; they are replacements. They do work in place of humans — think call centers and the like, to start — and they have all of the advantages of software: always available, and scalable up-and-down with demand.https://videopress.com/embed/Gs7bcY3B?hd=1&cover=1&loop=0&autoPlay=0&permalink=1&muted=0&controls=1&playsinline=0&useAverageColor=0&preloadContent=metadataWe know that workforces are overwhelmed. They’re doing these low-value tasks. They’ve got kind of a whole different thing post-pandemic. Productivity is at a different place. Capacity is at a different place…we do see that workforces are different, and we realize that 41% of the time seems to be wasted on low value and repetitive tasks, and we want to address that. The customers are expecting more: zero hold times, to be more personal and empathetic, to work with an expert all the time, to instantly schedule things. That’s our vision, our dream for these agents…What if these workforces had no limits at all? Wow. That’s kind of a strange thought, but a big one. You start to put all of these things together, and you go, we can kind of build another kind of company. We can build a different kind of technology platforms. We can take the Salesforce technology platform that we already have, and that all of you have invested so much into, the Salesforce Platform, and we can deliver the next capability. The next capability that’s going to make our companies more productive. To make our employees more augmented. And just to deliver much better business results. That is what Agentforce is.This Article isn’t about the viability of Agentforce; I’m somewhat skeptical, at least in the short term, for reasons I will get to in a moment. Rather, the key part is the last few sentences: Benioff isn’t talking about making employees more productive, but rather companies; the verb that applies to employees is “augmented”, which sounds much nicer than “replaced”; the ultimate goal is stated as well: business results. That right there is tech’s third philosophy: improving the bottom line for large enterprises.Notice how well this framing applies to the mainframe wave of computing: accounting and ERP software made companies more productive and drove positive business results; the employees that were “augmented” were managers who got far more accurate reports much more quickly, while the employees who used to do that work were replaced. Critically, the decision about whether or not make this change did not depend on rank-and-file employees changing how they worked, but for executives to decide to take the plunge.The Consumerization of ITWhen Benioff founded Salesforce in 1999, he came up with a counterintuitive logo:Of course Salesforce was software; what it was not was SOFTWARE, like that sold by his previous employer Oracle, which at that time meant painful installation and migrations that could take years, and even then would often fail. Salesforce was different: it was a cloud application that you never needed to install or update; you could simply subscribe.Cloud-based software-as-a-service companies are the norm now, thanks in part to Benioff’s vision. And, just as Salesforce started out primarily serving — you guessed it! — sales forces, SaaS applications can focus on individual segments of a company. Indeed, one of the big trends over the last decade were SaaS applications that grew, at least in the early days, through word-of-mouth and trialing by individuals or team leaders; after all, all you needed to get started was a credit card — and if there was a freemium model, not even that!This trend was part of a larger one, the so-called “consumerization of IT”. Douglas Neal and John Taylor, who first coined the term in 2001, wrote in a 2004 Position Paper:Companies must treat users as consumers, encouraging employee responsibility, ownership and trust by providing choice, simplicity and service. The parent/child attitude that many IT departments have traditionally taken toward end users is now obsolete.This is actually another way of saying what Sinofsky did: enterprise IT customers, i.e. company employees, no longer needed to be taught how to use a computer; they grew up with them, and expected computers to work the same way their consumer devices did. Moreover, the volume of consumer devices meant that innovation would now come from that side of technology, and the best way for enterprises to keep up would be to ideally adopt consumer infrastructure, and barring that, seek to be similarly easy-to-use.It’s possible this is how AI plays out; it is what has happened to date, as large models like those built by OpenAI or Anthropic or Google or Meta are trained on publicly available data, and then are available to be fine-tuned for enterprise-specific use cases. The limitation in this approach, though, is the human one: you need employees who have the volition to use AI with the inherent problems introduced by this approach, including bad data, hallucinations, security concerns, etc. This is manageable as long as a motivated human is in the loop; what seems unlikely to me is any sort of autonomous agent actually operating in a way that makes a company more efficient without an extensive amount of oversight that ends up making the entire endeavor more expensive.Moreover, in the case of Agentforce specifically, and other agent initiatives more generally, I am unconvinced as to how viable and scalable the infrastructure necessary to manage auto-regressive large language models will end up being. I got into some of the challenges in this Update:The big challenge for traditional LLMs is that they are path-dependent; while they can consider the puzzle as a whole, as soon as they commit to a particular guess they are locked in, and doomed to failure. This is a fundamental weakness of what are known as “auto-regressive large language models”, which to date, is all of them.To grossly simplify, a large language model generates a token (usually a word, or part of a word) based on all of the tokens that preceded the token being generated; the specific token is the most statistically likely next possible token derived from the model’s training (this also gets complicated, as the “temperature” of the output determines what level of randomness goes into choosing from the best possible options; a low temperature chooses the most likely next token, while a higher temperature is more “creative”). The key thing to understand, though, is that this is a serial process: once a token is generated it influences what token is generated next.The problem with this approach is that it is possible that, in the context of something like a crossword puzzle, the token that is generated is wrong; if that token is wrong, it makes it more likely that the next token is wrong too. And, of course, even if the first token is right, the second token could be wrong anyways, influencing the third token, etc. Ever larger models can reduce the likelihood that a particular token is wrong, but the possibility always exists, which is to say that auto-regressive LLMs inevitably trend towards not just errors but compounding ones.Note that these problems exist even with specialized prompting like insisting that the LLM “go step-by-step” or “break this problem down into component pieces”; they are still serial output machines that, once they get something wrong, are doomed to deliver an incorrect answer. At the same time, this is also fine for a lot of applications, like writing; where the problem manifests itself is with anything requiring logic or iterative reasoning. In this case, a sufficiently complex crossword puzzle suffices.That Update was about OpenAI’s new o1 model, which I think is a step change in terms of the viability of agents; the example I used in that Update was solving a crossword puzzle, which can’t be done in one-shot — but can be done by o1.o1 is explicitly trained on how to solve problems, and second, o1 is designed to generate multiple problem-solving streams at inference time, choose the best one, and iterate through each step in the process when it realizes it made a mistake. That’s why it got the crossword puzzle right — it just took a really long time.o1 introduces a new vector of potential improvement: while auto-regressive LLMs scaled in quality with training set size (and thus the amount of compute necessary), o1 scales inference. This image is from OpenAI’s announcement page:This second image is a potential problem in a Copilot paradigm: sure, a smarter model potentially makes your employees more productive, but those increases in productivity have to be balanced by both greater inference costs and more time spent waiting for the model (o1 is significantly slower than a model like 4o). However, the agent equation, where you are talking about replacing a worker, is dramatically different: there the cost umbrella is absolutely massive, because even the most expensive model is a lot cheaper, above-and-beyond the other benefits like always being available and being scalable in number.More importantly, scaling compute is exactly what the technology industry is good at. The one common thread from Wave 1 of computing through the PC through SaaS and consumerization of IT, is that problems gated by compute are solved not via premature optimizations but via the progression of processing power. The key challenge is knowing what to scale, and I believe OpenAI has demonstrated the architecture that will benefit from exactly that.Data and PalantirThat leaves the data piece, and while Benioff bragged about all of the data that Salesforce had, it doesn’t have everything, and what it does have is scattered across the phalanx of applications and storage layers that make up the Salesforce Platform. Indeed, Microsoft faces the same problem: while their Copilot vision includes APIs for 3rd-party “agents” — in this case, data from other companies — the reality is that an effective Agent — i.e. a worker replacement — needs access to everything in a way that it can reason over. The ability of large language models to handle unstructured data is revolutionary, but the fact remains that better data still results in better output; explicit step-by-step reasoning data, for example, is a big part of how o1 works.To that end, the company I am most intrigued by, for what I think will be the first wave of AI, is Palantir. I didn’t fully understand the company until this 2023 interview with CTO Shyam Sankar and Head of Global Commercial Ted Mabrey; I suggest reading or listening to the whole thing, but I wanted to call out this exchange in particular:Was there an aha moment where you have this concept — you use this phrase now at the beginning of all your financial reports, which is that you’re the operating system for enterprises. Now, obviously this is still the government era, but it’s interesting the S-1 uses that line, but it’s further down, it’s not the lead thing. Was that something that emerged later or was this that, “No, we have to be the interface for everything” idea in place from the beginning?Shyam Sankar: I think the critical part of it was really realizing that we had built the original product presupposing that our customers had data integrated, that we could focus on the analytics that came subsequent to having your data integrated. I feel like that founding trauma was realizing that actually everyone claims that their data is integrated, but it is a complete mess and that actually the much more interesting and valuable part of our business was developing technologies that allowed us to productize data integration, instead of having it be like a five-year never ending consulting project, so that we could do the thing we actually started our business to do.That integration looks like this illustration from the company’s webpage for Foundry, what they call “The Ontology-Powered Operating System for the Modern Enterprise”:What is notable about this illustration is just how deeply Palantir needs to get into an enterprise’s operations to achieve its goals. This isn’t a consumery-SaaS application that your team leader puts on their credit card; it is SOFTWARE of the sort that Salesforce sought to move beyond.If, however, you believe that AI is not just the next step in computing, but rather an entirely new paradigm, then it makes sense that enterprise solutions may be back to the future. We are already seeing that that is the case in terms of user behavior: the relationship of most employees to AI is like the relationship of most corporate employees to PCs in the 1980s; sure, they’ll use it if they have to, but they don’t want to transform how they work. That will fall on the next generation.Executives, however, want the benefit of AI now, and I think that benefit will, like the first wave of computing, come from replacing humans, not making them more efficient. And that, by extension, will mean top-down years-long initiatives that are justified by the massive business results that will follow. That also means that go-to-market motions and business models will change: instead of reactive sales from organic growth, successful AI companies will need to go in from the top. And, instead of per-seat licenses, we may end up with something more akin to “seat-replacement” licenses (Salesforce, notably, will charge $2 per call completed by one of its agents). Services and integration teams will also make a comeback. It’s notable that this has been a consistent criticism of Palantir’s model, but I think that comes from a viewpoint colored by SaaS; the idea of years-long engagements would be much more familiar to tech executives and investors from forty years ago.Enterprise PhilosophyMost historically-driven AI analogies usually come from the Internet, and understandably so: that was both an epochal change and also much fresher in our collective memories. My core contention here, however, is that AI truly is a new way of computing, and that means the better analogies are to computing itself. Transformers are the transistor, and mainframes are today’s models. The GUI is, arguably, still TBD.To the extent that is right, then, the biggest opportunity is in top-down enterprise implementations. The enterprise philosophy is older than the two consumer philosophies I wrote about previously: its motivation is not the user, but the buyer, who wants to increase revenue and cut costs, and will be brutally rational about how to achieve that (including running expected value calculations on agents making mistakes). That will be the only way to justify the compute necessary to scale out agentic capabilities, and to do the years of work necessary to get data in a state where humans can be replaced. The bottom line benefits — the essence of enterprise philosophy — will compel just that.And, by extension, we may be waiting longer than we expect for AI to take over the consumer space, at least at the scale of something like the smartphone or social media. That is good news for the MKBHDs of everything — the users with volition — but for everyone else the biggest payoff will probably be in areas like entertainment and gaming. True consumerization of AI will be left to the next generation who will have never known a world without it.https://www.youtube.com/embed/3cgjWHzFdj4?si=J7o0xEahi8W-k7qT&controls=0

Older Posts→Search

Intel’s CISC Moat

Intel’s Missed Opportunity

Intel’s Death

Saving Intel

You May Also Like

vim

园艺

QPS TPS

发表回复 取消回复

发表回复取消回复