1000005135.jpg

At the 1956 Dartmouth Workshop – the event widely considered to mark the birth of AI as an academic field – a small group of pioneers gathered in Hanover, NH. In this iconic photo are AI founding fathers like Marvin Minsky, John McCarthy, Claude Shannon and others, who convened to “map out future paths” for machine intelligence.

The Meeting of The Minds

In the summer of 1956, a handful of scientists met at Dartmouth College to chase a radical idea: that machines might think. This gathering, later dubbed the Dartmouth Workshop, is now seen as the launch of AI as a field of research. Yet the dream of artificial intelligence was already stirring well before that. As early as the 1940s, with the invention of the first programmable digital computers, researchers mused about creating an “electronic brain”. Computing pioneers like Alan Turing had begun asking profound questions – Can machines think? – and devising tests for intelligence. The very term “Artificial Intelligence” was coined at the Dartmouth workshop by John McCarthy in 1956, but the concept had been percolating from the moment humanity built machines that could crunch numbers. In those early days, computers were hulking, room-sized contraptions, yet people marveled at how these “giant electronic brains” might mimic human reasoning. Early successes, like programs that could play chess or prove logical theorems, fueled a sense that human-level AI was just around the corner.

The Early Dreams of Artificial Intelligence

The history of AI since that hopeful dawn has been a roller coaster of optimism and sobering reality. By the 1970s, it became clear that replicating the full breadth of human intelligence was far harder than early pioneers imagined. Periods of “AI winter” – when funding and interest dried up – followed cycles of hype. Yet progress never stopped. By the 21st century, improved algorithms, immense datasets, and exponentially more powerful computers led to breakthroughs that reignited AI like never before. The transformer neural network architecture introduced in 2017 proved a game-changer, yielding a new generation of AI systems that learn from vast swaths of data. As we stand today, AI is not a speculative idea but a tangible force – one poised to reshape economies and daily life. However, to appreciate how we got here, we must first understand what “artificial intelligence” really means, in its various forms.

Narrow, General, and Superintelligence: The Spectrum of AI

Not all AI is created equal. In fact, the term “artificial intelligence” spans a spectrum from the simple to the fantastical. At one end is Artificial Narrow Intelligence (ANI) – often just called narrow AI or weak AI. This is the only kind of AI that actually exists today. Narrow AI is specialized: it’s designed to excel at specific tasks, and indeed often far surpasses human abilities in its narrow domain. A spam filter that catches junk emails, a speech recognizer that transcribes your voice, or even a powerful chess-playing program – all these are narrow AIs. They can perform one job (or a limited set of jobs) extremely well, but they cannot generalize their knowledge beyond their training. As IBM’s AI team explains, “It can be trained to perform a single or narrow task... However, it can’t perform outside of its defined task”. Siri, Amazon’s Alexa, IBM’s Watson – even OpenAI’s celebrated ChatGPT – are all examples of narrow AI, each brilliant at one thing but ultimately confined to it.

Within the realm of narrow AI lies a rich ecosystem of subfields and specialties that together make up the modern AI landscape. Natural Language Processing (NLP), for instance, focuses on enabling machines to understand and generate human language – from chatbots that hold conversations to translation tools. NLP combines techniques from linguistics and machine learning so that computers can grasp the meaning and intent behind our words. Then there is Computer Vision, the field that gives eyes to AI. Its goal is to teach computers to interpret visual data – identifying objects in photos, recognizing faces, analyzing medical images, and more. Through techniques like neural networks trained on millions of labeled pictures, vision systems can now “see” and categorize the world with astonishing accuracy. Meanwhile, Robotics marries AI with machines that act in the physical world. Robots can be as simple as a factory arm assembling cars or as complex as a humanoid that can walk and talk. Modern robots often rely on AI for perception and decision-making, allowing them to handle tasks like navigating a warehouse or assisting in surgery. As one overview puts it, robots are “programmed machines that can automatically carry out complex series of actions,” and AI-powered robots increasingly operate with autonomy, helping in manufacturing, healthcare, and beyond.

Another key domain is Reinforcement Learning (RL) – a technique inspired by how animals learn through reward and punishment. In RL, an AI agent (say, a game-playing program) isn’t explicitly told what to do; instead, it experiments in its environment and gets feedback in the form of rewards for desirable outcomes. Through trial and error, it gradually learns strategies that maximize its reward. This approach has led to remarkable achievements, from AIs that can trounce humans at complex games like Go, to systems that optimize industrial processes. There are also multimodal systems, a newer breed of AI that combine multiple forms of data – like vision and language together – in one model. Just as we humans learn by integrating what we see, hear, and read, multimodal AI attempts to understand context across text, images, audio, and more. An example is an AI that can look at a photo and describe it in words, blending computer vision and NLP. These systems reflect a step toward more human-like cognition, breaking out of single-sense silos.

If narrow AI is defined by specialization, the next level – often discussed but not yet achieved – is Artificial General Intelligence (AGI). An AGI would be a system with the flexible, general-purpose intelligence of a human being. It wouldn’t be limited to one task or domain; it could reason, learn, and apply knowledge to vastly different problems. In essence, AGI could understand and do almost anything a person could, across contexts. As of today, AGI remains hypothetical. No machine can wake up one morning and decide to switch from playing chess to composing symphonies unless it’s specifically trained for both. However, the concept of AGI looms large in the AI research community – it’s the long-term north star for organizations like OpenAI and DeepMind, who explicitly aim to build more general intelligence. Achieving AGI would likely require fundamental breakthroughs beyond today’s algorithms, but if it ever comes, it could usher in transformations perhaps on the scale of the industrial revolution or greater.

Beyond AGI lies the even more speculative realm of Artificial Superintelligence (ASI) – AI that surpasses human intelligence by such a margin that it can outperform us in virtually every domain, including scientific creativity, social skills, and general wisdom. Superintelligent AI, as imagined by futurists, might not only solve problems faster than any genius, but could develop abilities utterly beyond human ken. It’s the kind of AI that dominates science fiction – sometimes as a benevolent godlike presence, sometimes as a threat to human existence. While this might sound far-fetched, serious thinkers do contemplate ASI as a possible outcome of continued AI progress. If an AI can improve itself (by rewriting its own code or designing better machines), there’s a scenario where it enters a feedback loop of self-improvement, rapidly bootstrapping its intelligence to levels we can barely fathom. For now, ASI is purely theoretical – a thought experiment about the ultimate potential (and peril) of AI. Yet it’s important to frame this spectrum of AI – from the narrow systems all around us today, to the general and super intelligences that spark debates about the future of humanity. This report will focus largely on the spectacular progress in narrow AI and hints of general AI to come, because that is where we stand in 2025: on the doorstep of systems that remain narrow, but are becoming astonishingly powerful and broad in their capabilities within that narrowness.

Teaching Machines to Learn: How AI Models Are Built

If early AI programs were hand-crafted by programmers (with humans figuring out rules and logic), today’s most advanced AI systems are largely learned by the machines themselves – albeit with huge amounts of human guidance, data, and computing power behind the scenes. At the heart of the current AI revolution are what researchers call foundation models: giant neural networks trained on “mountains of raw data” to create a versatile base of knowledge. A prime example is the class of large language models (LLMs) – like OpenAI’s GPT-4 or Google’s PaLM – which are trained on virtually the entire internet (trillions of words of text) to develop a broad statistical understanding of language. These models are not programmed with explicit rules; instead, through a process called self-supervised learning, they learn patterns, grammar, facts and even subtle connections in data by simply trying to predict what comes next in a sequence. Over many epochs of training, an LLM effectively “reads” everything from Wikipedia articles to classic literature to forum posts, gradually adjusting millions (or billions) of neural weights to distill the essence of human language. The end result is a model that can generate coherent text, answer questions, write code, or engage in conversation, drawing on the vast knowledge implicitly stored in its weights.

Building such a model is a feat that combines data, computation, and human feedback in roughly three stages. The first stage is pre-training: feeding the neural network an enormous corpus of unlabeled data (for example, all forms of text) and training it to predict missing pieces. This task sounds simple – guess the next word in a sentence, for instance – but doing so at internet scale forces the model to absorb syntax, semantics, and factual content from its training set. The scale here is almost unimaginable: OpenAI’s earlier GPT-3 model, for instance, was trained on a dataset of about 500 billion words and had 175 billion parameters in the network. Google’s more recent models have been even larger, measured in trillions of parameters. As one industry report noted, the size of these models has been doubling every few months in recent years. The training process for a single large model can cost tens of millions of dollars in cloud computing time, running on thousands of cutting-edge GPU chips in parallel. (OpenAI famously trained GPT-3 on a supercomputer built with 10,000 NVIDIA GPUs, humming away for weeks.) The compute scale is so great that it has been compared to a space launch – and indeed, only a handful of tech companies in the world have the resources to train such models from scratch.

After this brute-force learning phase, the AI model emerges with a vast but raw capability. It “knows” a lot in a statistical sense, yet it might not reliably do what users want. So a second phase is often applied: fine-tuning the model on specific tasks or with specific guidelines. This is where data labeling and human expertise come in. Companies will curate datasets of question-answer pairs, correct solutions, or demonstrations of desired behaviors, and further train the model on this supervised data. For example, an LLM might be fine-tuned on hundreds of thousands of example dialogues so that it learns to produce more helpful, conversational answers. Crucially, a recent innovation in steering these models is Reinforcement Learning from Human Feedback (RLHF). In RLHF, human reviewers are asked to rank or grade the model’s outputs, and those judgments are used to train a reward model. The language model then undergoes a reinforcement learning process to optimize its outputs for higher reward – effectively tuning it to align with human preferences and values. OpenAI employed this technique to make ChatGPT more polite, accurate, and safe: after the initial model was built, it was fine-tuned with demonstrations of good answers, and then further refined by learning from human feedback on its responses. The result is an AI that not only can generate text, but generally knows how it should respond to be useful.

It’s hard to overstate the importance of data quality in this process. An old maxim in computing is “garbage in, garbage out,” and it holds doubly true for AI models. These systems are only as good as the data they are trained on – they are essentially mirror reflections of the material they ingest. Feed a model on a diet of disinformation or erratic internet chatter and it will happily absorb those flaws. That’s why AI builders invest heavily not just in big data, but clean and curated data. Teams spend thousands of hours filtering out problematic content, correcting errors, and balancing the training mix. For example, the team behind the Falcon LLM (a project from the UAE) emphasized how they “spent a lot of time building the right dataset” because the volume of data alone isn’t enough – quality and relevance are paramount to performance. Similarly, the “Chinchilla” research by DeepMind in 2022 showed that many models had been under-trained relative to their size, and that feeding models with more data (even without increasing size) yielded better results, upending earlier assumptions. This spurred a new focus on assembling ever larger and richer datasets – to the point that some experts worry we might even “run out of Internet data” suitable for training within a few years. That may be hyperbole, but it underscores how voracious these AIs are: a single training run can consume the textual equivalent of millions of books.

Finally, once an AI model is trained and fine-tuned, it enters deployment and continual improvement. These foundation models become the bedrock upon which myriad applications are built – and they can be adapted to tasks far beyond what they were originally trained on. A language model like GPT-4, for instance, can be further tuned to become a coding assistant, a legal question-answerer, or a customer service chatbot with relatively small additional training on domain-specific examples. This reusability is why they’re called foundation models: much like a foundation, they support many higher-level tools and applications. Companies now compete not just on building models, but on fine-tuning and serving them efficiently via cloud APIs, reaching developers and end-users worldwide. And because training from scratch is so costly, there’s a trend toward open-sourcing models or using pretrained backbones, so that the wheel isn’t reinvented each time. The open-source community – aided by models like Meta’s LLaMA, released freely to researchers – has further democratized AI development, allowing enthusiasts and smaller firms to experiment on top of cutting-edge models without footing the multi-million-dollar training bills.

In summary, creating a modern AI like a large language model is akin to training a digital brain. You immerse it in data (books, websites, recordings), shape it with algorithms and feedback, and then set it loose to perform tasks. The secret sauce is not a single breakthrough but a pipeline: massive data, massive compute, clever architectures, and human-in-the-loop refinement. This pipeline has yielded AI models that astonish with their fluency and capabilities, yet also behave strangely at times – a reminder that they learned everything from human-created content, and thus reflect our knowledge, biases, and even our flaws. As we use these models, we are effectively tapping into an aggregate of human culture and information, distilled and remixed by a machine. It’s both exhilarating and humbling to realize that teaching machines to learn in this way has been one of the defining triumphs of 21st-century technology – and it is the engine now driving a fierce global race for AI supremacy.

From Siri to ChatGPT: The Evolution of AI Assistants