Tuesday, January 27, 2026
More
    More
      Learn from the past.
      Prepare for the future.

      Goldman Sachs’ Neema Raphael: Enterprise Data Is Key to AI’s Future

      In a recent episode of Exchanges at Goldman Sachs, Neema Raphael, Chief Data Officer and Head of Data Engineering at Goldman Sachs, sat down with George Lee, Co-Head of the Goldman Sachs Global Institute, and Allison Nathan, Senior Strategist in Goldman Sachs Research, to discuss the evolving role of data in artificial intelligence and how the enterprise world may hold the key to AI’s future.

      Neema Raphael

      Reflecting on the shift in computer science over the last several decades, Raphael described a major turning point. “For the first 50 or 60 years of computer science, humans had to code rules to tell the computer what to do. And so there was a fundamental shift…which is like learn by example instead of learn by rules.”

      He said generative AI is part of that same trajectory: “In some ways, the generative AI stuff is just a continuation of learn by example. But I don’t think people naturally saw it go from, hey, I could learn maybe how to predict some patterns, to now the computer could create anything.”

      According to Raphael, this ability to generate content (language, images, audio) is what marks generative AI as a “novel step change”.

      When asked about how people inside organizations are adapting to the probabilistic nature of AI, Raphael said finance may be somewhat more prepared than other industries. “In finance, people maybe have understood that because of our pricing models and derivatives pricing… it was always stochastic in that way anyways.”

      “So I think there was maybe a willingness to sort of understand that here in the finance world,” he added.

      Still, he acknowledged the challenge in helping non-engineers understand that AI is not a magical prediction engine. “When non-engineers sit at a computer, they sort of want a thing to be a repeatable pattern. That’s how we build workflows here…So I think it’s really about teaching people: this isn’t just some magic crystal ball. What it’s really doing is taking a lot of examples and giving you an extrapolation.”

      Raphael said he has historically been a skeptic of new technology hype, citing blockchain as an example. But AI has been different. “I think, from an AI perspective, it’s obvious that it’s real. It’s here to stay. There is absolutely a hype to it…But also, when you go on your phone and you ask Claude, Gemini, GPT, take a picture…and you get great answers…It’s definitely, definitely real in the sort of consumer world.”

      The shift in his own view came through a hands-on experience. “Agent coding, for example, is the thing that sort of flipped my brain from this might be vaporware to like, wow, this is really real…It was helping me with problems that I’ve never been able to solve before.”

      He described it as “incredibly powerful as a superhuman ability”.

      When asked whether AI might eventually run out of data to learn from, Raphael replied bluntly: “We’ve already run out of data.”

      He pointed out that many recent model advancements appear to be happening with less training data or lower compute costs. One hypothesis, he said, is that “they trained against another model,” meaning that newer models are being trained on the outputs of older models.

      What’s more important now, he argued, is how synthetic data and trapped enterprise data will shape the next wave of progress. “The explosive nature of the synthetic data and the fact that now the computer could generate an infinite amount of more data…I don’t think it’s going to be a massive constraint.”

      He added: “There’s still a lot of data here at Goldman that can be used” to augment the capabilities of employees—salespeople, traders, clients, and portfolio managers—through tools that provide “information synthesis” and support hypothesis testing.

      Raphael explained that data within companies was long treated as a byproduct of operations. “It’s always historically been thought of as like business exhaust in some way, right? Like, trader executes a trade—they’re sort of like, okay, I’m done now. I’m just managing the risk.”

      But beneath that surface is a wealth of structured and unstructured data that, if properly integrated and understood, can power AI applications. The key challenge, he said, is to “get that disparate data into some place where you could organize it in a sane way” and normalize it “where the data is correct”.

      Understanding how different pieces of data connect is essential. “You have to understand, are these two concepts the same? Are they linked differently?” Raphael said. That’s the foundation of data engineering: “People are like, we need a practice of engineering that’s like software for data.”

      Asked whether AI models themselves could help with this process, Raphael said: “Definitely. People have built software agents…to do this cleansing, this normalization, this linking.”

      He described a growing synergy between model development and data quality improvement: “There’s also a feedback loop of data cleansing and normalization and wrangling too.”

       

      MOST READ

      PODCAST