AI Developments and Innovations: OpenAI, Google Deepmind, and More

OpenAI is planning to develop a browser with an integrated AI chatbot and AI search. The company has presented the concept to potential clients like Condé Nast, Eventbrite, and Priceline. This move would make OpenAI a direct competitor to Google, which dominates the browser market with Chrome. However, an immediate release is not planned. Meanwhile, Google is under pressure from competition regulators. The US Department of Justice is demanding the sale of Chrome and possibly Android. The EU Commission is also investigating Google’s business practices concerning the Digital Markets Act. OpenAI has recruited experienced developers for the browser project, including Ben Goodger (formerly Firefox and Chrome) and Darin Fisher (The Browser Company, Mozilla, Google). The project aims to be part of a “Natural Language Web,” where users can interact with websites like a chatbot.

The AI startup Anthropic has deepened its collaboration with Amazon Web Services. Amazon is investing an additional 4 billion US dollars, making it the primary cloud and training partner for Anthropic. Anthropic engineers will work closely with Annapurna Labs, a subsidiary of AWS, known for its Trainium chips specialized in training machine learning models. The partnership aims to further optimize hardware efficiency. Anthropic plans to benefit from this in training its foundation models around the Claude family. Amazon’s total investment in the startup increases to 8 billion US dollars, though it remains a minority investor. Other investors include Alphabet, Microsoft, Apple, and Nvidia.

Google Deepmind is working on understanding how large language models function. A team focused on mechanistic interpretability is developing tools to look under the hood of AI. The tool Gemma Scope helps researchers understand what happens when generative systems produce output. The goal is to reverse-engineer the algorithms within these systems. To find features in Google’s AI model Gemma, Deepmind applied a tool called “Sparse Autoencoder” to its layers. This tool acts like a microscope, magnifying these layers to view their details. The challenge is deciding how granular the autoencoders should be. Deepmind’s solution was to run Sparse Autoencoders of different sizes to vary the number of features the autoencoder should find. This helps identify and fix biases or errors in language models. Mechanistic interpretability could be a plausible way to achieve alignment, ensuring the AI does what we expect. Gemma and the autoencoders are open-source, encouraging further research into the model’s internal logic.

A US Congress commission recommends a massive “Manhattan Project”-like program for developing Artificial General Intelligence (AGI) to compete with China. Leading AI companies should receive long-term contracts and funding. To protect against Chinese industrial espionage, stricter laws are proposed, such as approving Chinese investments in US biotech firms, restricting imports of certain Chinese technologies, and prohibiting Chinese investors from taking board seats in strategic US technology sectors. China considers AI central to its future military strategy to exploit US military vulnerabilities. The country has invested heavily in AI education and leads in the number of AI research articles. It’s uncertain if Congress will follow the proposal.

Apple is developing a new version of its digital assistant Siri with advanced language models. Internally called “LLM Siri,” this version aims to enable more human-like conversations and handle complex queries. According to Bloomberg, the new Siri version will be announced in 2025 and available to users in spring 2026. Apple is currently testing it as a separate app on iPhones, iPads, and Macs. The technology is expected to eventually replace the existing Siri interface. Before launch, Apple plans interim steps, like integrating ChatGPT into Apple Intelligence. The new Siri will focus on privacy, although Apple might still offer access to specialized third-party AI systems.

The AI startup Genmo has released its video model Mochi 1, the largest publicly available AI model for video generation. With 10 billion parameters, it sets new standards in motion quality and text instruction implementation. Mochi 1 can create videos with 30 FPS and up to 5.4 seconds in 480p resolution. It realistically simulates physical effects like liquids and fur or hair movement, optimized for photorealistic content. A 720p version is planned for 2024. The model is based on a new “Asymmetric Diffusion Transformer” architecture that processes text and video separately. In benchmarks, it achieves higher accuracy in prompt implementation and more realistic movements than competing models. Code and weights are available under the Apache-2.0 license.

A cryptocurrency enthusiast lost 2500 US dollars after using code from ChatGPT that contained a fraudulent Solana API. He intended to program a “Bump Bot” for cryptocurrency trading and trusted the suggested code, which transmitted his private key via the API. The incident reveals several basic security errors: using a production account for development and exposing a private key. The fraudster withdrew all crypto assets from the wallet within 30 minutes. The affected person reported the incident to OpenAI and the fraudulent repository to GitHub, which was then removed. The case highlights the need for critical review of AI-generated results.

Generative AI is credited with making digital office work more efficient and saving time. However, a study by Intel suggests otherwise. The survey found that users of new AI PCs, equipped with NPU (Neural Processing Unit), spent more time on computer tasks than those using regular PCs or laptops. Intel described this as a “worrying statistic.” While there is potential for time savings with AI, many users still spend a lot of time figuring out how to communicate with AI chatbots to achieve desired results. Intel concludes that there is a need for more education. Companies offering AI-supported products should provide more information on implementation and use to fully realize AI’s potential.

Microsoft has released AI Shell as a public preview. The project equips the shell with AI capabilities, offering interaction with AI agents from Azure OpenAI and Copilot in Azure. It can be used as a standalone application or with Microsoft’s open-source shell PowerShell 7. AI Shell consists of a command-line shell interface, a framework for creating AI agents, a PowerShell module, and two integrated AI agents. The Azure-OpenAI agent can use any AI model provided by Azure OpenAI and is suitable for a wide range of queries, natural language interpretations, and code generation. The AI-supported tool Copilot in Azure is available as a preview and specializes in cloud-centric assistance. AI Shell is available for Windows, macOS, and Linux.