On June 29, Palo Alto-based Inflection AI announced the completion of a $1.3 billion raise led by Microsoft, Reid Hoffman, Bill Gates, Eric Schmidt and Nvidia. The new capital will be partly allocated to building a 22,000-unit Nvidia H100 Tensor GPU cluster, which the company claims is the largest in the world. The GPUs will be used to develop large-scale artificial intelligence models. Developers wrote:
“We estimate that if we entered our cluster in the recent TOP500 list of supercomputers, it would be the 2nd and close to the top entry, despite being optimized for AI — rather than scientific — applications.”
Inflection AI is also developing its own personal adjutant AI system dubbed “Pi.” The firm explained that Pi is “a teacher, coach, confidante, creative partner, and sounding board” that can be accessed directly via social media or WhatsApp. The company’s total funding amount has reached $1.525 billion since its inception in early 2022.
Despite the growing investment in large AI models, experts have warned that their actual training efficiency can become severely restricted by current technological limitations. In one example raised by Singaporean venture fund Foresight, researchers wrote, citing the example of a 175 billion parameter large AI model storing 700GB of data:
“Assuming we have 100 computing nodes and each node needs to update all parameters at each step, each step would require transmitting about 70TB of data (700GB*100). If we optimistically assume that each step takes 1s, then 70TB of data would need to be transmitted per second. This demand for bandwidth far exceeds the capacity of most networks.”
Continuing from the above example, Foresight also warned that “due to communication latency and network congestion, data transmission time might far exceed 1s,” meaning that computing nodes could spend most of their time waiting for data transmission instead of performing actual computation. Foresight analysts concluded, given the current restraints, that the solution lies in small AI models, which are “easier to deploy and manage.”
“In many application scenarios, users or companies do not need the more universal reasoning capability of large language models but are only focused on a very refined prediction target.”