Vision-language models (VLMs) are AI models that combine both vision and language modalities. These models can process both images and natural language. Researchers are expanding VLMs by including an action layer. These models can process visual and textual information and generate sequences of decisions for real-world scenarios. This fusion of vision, language, and action within computational models is emerging as a potentially useful AI paradigm for a wide range of applications. Vision-Language-Action Models (VLAs) are designed to perceive visual data, interpret it using linguistic context, and subsequently generate a corresponding action or response. In essence, VLAs emulate human-like cognition, where sight, comprehension, and action intertwine.

At its core, VLAs marries computer vision with natural language processing. The vision component enables machines to “see” or interpret visual data. This is complemented by the language component which processes this visual information in linguistic terms, enabling the machine to “understand” or describe what it sees. Finally, the action component facilitates a response, whether that be a decision, movement, or another specific output.

Wayve recently introduced LINGO-1, an open-loop driving commentator. Some key quotes from their announcement:

The use of natural language in training robots is still in its infancy, particularly in autonomous driving. Incorporating language along with vision and action may have an enormous impact as a new modality to enhance how we interpret, explain and train our foundation driving models. By foundation driving models, we mean models that can perform several driving tasks, including perception (perceiving the world around them), causal and counterfactual reasoning (making sense of what they see), and planning (determining the appropriate sequence of actions). We can use language to explain the causal factors in the driving scene, which may enable faster training and generalisation to new environments.

We can also use language to probe models with questions about the driving scene to more intuitively understand what it comprehends. This capability can provide insights that could help us improve our driving models’ reasoning and decision-making capabilities. Equally exciting, VLAMs open up the possibility of interacting with driving models through dialogue, where users can ask autonomous vehicles what they are doing and why. This could significantly impact the public’s perception of this technology, building confidence and trust in its capabilities.

In addition to having a foundation driving model with broad capabilities, it is also eminently desirable for it to efficiently learn new tasks and quickly adapt to new domains and scenarios where we have small training samples. Here is where natural language could add value in supporting faster learning. For instance, we can imagine a scenario where a corrective driving action is accompanied by a natural language description of incorrect and correct behaviour in this situation. This extra supervision can enhance few-shot adaptations of the foundation model. With these ideas in mind, our Science team is exploring using natural language to build foundation models for end-to-end autonomous driving.

These models enable us to ask questions so we can better understand what the model “sees” and to better understand its reasoning.  Here’s an example:

Language can help interpret and explain AI model decisions, a potentially useful application when it comes to adding transparency and understanding to AI. It can also help train models, enabling them to more quickly adapt to changes in the real-world.

The question, “Will AI take my job?” is becoming ubiquitous in the modern workplace. As AI continues to evolve, it’s natural for all of us to feel a mix of excitement and trepidation. Here are ten insights and actions we can take to gauge and navigate the AI revolution:

  1. AI Doesn’t Just Replace, It Augments:
    • Insight: AI doesn’t aim to replace human roles. More often, it augments them, making tasks easier and more efficient.
    • Action: Embrace AI tools in your current role. By integrating them into your daily tasks, you can enhance your productivity and showcase your adaptability.
  2. Soft Skills Matter More Than Ever:
    • Insight: While AI excels in processing and pattern recognition, it still struggles with empathy, creativity, and interpersonal communication.
    • Action: Invest in developing soft skills. Attend workshops, read books, or take online courses to hone your emotional intelligence, leadership, and creativity.
  3. Routine is AI’s Playground:
    • Insight: Jobs that involve repetitive, routine tasks are more susceptible to automation.
    • Action: Diversify your skill set. If your role involves routine tasks, seek opportunities to take on more strategic, varied responsibilities.
  4. AI Struggles with Ambiguity:
    • Insight: AI requires clear instructions and defined parameters. Ambiguous tasks that require nuanced judgment remain a human domain.
    • Action: Position yourself in roles that demand decision-making in ambiguous situations. This could involve strategy, planning, or crisis management.
  5. Continuous Learning is the New Normal:
    • Insight: The AI landscape is ever-evolving. What’s cutting-edge today might be obsolete tomorrow.
    • Action: Adopt a mindset of continuous learning. Regularly update your knowledge about the latest AI trends and their implications for your industry.
  6. Interdisciplinary Knowledge is a Shield:
    • Insight: AI finds it challenging to replicate interdisciplinary expertise, where knowledge from multiple domains is applied.
    • Action: Don’t pigeonhole yourself. Gain expertise in complementary fields to make your skill set unique and invaluable.
  7. AI is Only as Good as Its Data:
    • Insight: AI relies on vast amounts of data. Roles that involve data cleaning, interpretation, and ethical considerations are crucial.
    • Action: Understand the data that drives AI in your industry. Consider roles in data analytics, interpretation, or ethics.
  8. Ethical Considerations are Paramount:
    • Insight: As AI becomes more integrated into our lives, ethical dilemmas will arise. Human judgment will be essential in navigating these challenges.
    • Action: Engage in discussions about the ethical implications of AI in your field. This will position you as a thoughtful leader in the AI conversation.
  9. AI Cannot Replicate Human Networks:
    • Insight: While AI can analyze social networks, it cannot replicate the depth of human relationships and the nuances of networking.
    • Action: Build and maintain a strong professional network. Relationships will always be a cornerstone of business, irrespective of AI’s growth.
  10. AI is a Tool, Not a Replacement:
    • Insight: At its core, AI is a tool designed to aid human endeavors, not replace them entirely.
    • Action: Stay informed about how AI can be a tool in your toolkit, rather than a threat. Collaborate with AI, rather than competing against it.

The rise of AI undoubtedly brings challenges, but it also offers opportunities. By understanding the nuances of AI’s capabilities and limitations, and by proactively adapting, workers can not only ensure their relevance but also thrive in an AI-augmented workplace. The future isn’t about humans vs. machines; it’s about humans and machines working in tandem to achieve unprecedented outcomes.

In today’s interconnected world, the ripple effects of a single event can be felt across continents. But what happens when multiple events, each with its own set of challenges, converge?

In our globalized world, systems are deeply interconnected. A disruption in one area can quickly spread to others. For example, a health crisis can strain healthcare systems, leading to economic challenges as businesses close, which in turn can spark social unrest.

Increasingly, risks do not only accumulate, they multiply. A “polycrisis” is a complex web of interrelated challenges that amplify each other. This compounding effect can make the collective impact far greater than the sum of its parts. Each event is a significant crisis in its own right. However, when they occur simultaneously, their combined effects can lead to unprecedented challenges. Consider the 2007-2008 global financial crisis. When coupled with rising food prices and political tensions, it led to protests and regime changes in several countries.

“Poly” is derived from the Greek word “polus,” which means “many” or “much” while “crisis” comes from the Greek “krisis,” meaning “a turning point” or “decision.” Polycrisis can be interpreted as “many crises” or “multiple turning points.”

Business leaders should ponder the implications of this second interpretation. Multiple turning points introduce layers of complexity, requiring individuals or entities to continuously adapt, reassess, and recalibrate their strategies. These junctures can emerge simultaneously or in quick succession, and their interconnected nature often means that a decision made at one point can influence the outcomes at subsequent ones. Navigating multiple turning points demands agility, foresight, and a deep understanding of the broader context, as each decision can set off a cascade of consequences, shaping the trajectory in multifaceted ways.

Here are a few strategies to consider when confronting polycrisis periods.

Collaboration: Countries and organizations need to work together, sharing resources and expertise.

Flexibility: Traditional solutions may not work. Innovative and adaptive strategies are essential.

Communication: Transparent communication can help alleviate public anxiety and ensure that everyone is on the same page.

Long-term Planning: While immediate relief is crucial, it’s also essential to think about the long-term implications and strategies for recovery.

A polycrisis is a testament to the intricate web of our global systems. While daunting, it also offers an opportunity. By understanding the nature of a polycrisis and working collaboratively, we can not only navigate these challenges but also build a more resilient future.


The Ripple Effect of AI touches Everyone

Every AI forecast is wrong because all of these forecasts implictly miss how technology transforms a society.

There’s a misconception bouncing around that AI’s impact will be limited to those who directly use or interact with it. While people might not be saying this explicitly, the misconception is implicit in the AI forecasts flooding the news. The reality is far more encompassing. AI is not just about automating tasks or improving efficiency; it, like other transformative technologies, is about fundamentally changing the way our world operates.

The most recent addition comes from JP Morgan analyst Brian Nowak, who estimates AI will impact 44% of the labor force over the next few years. While the number seems impressive on the surface, it underestimates the broader implications. Figures like these only capture the direct impact.

Consider how the automobile changed society. Vehicle ownership crossed 50% in 1948, but even before that, vehicle ownership was impacting society, influencing which towns thrived and which ones died. Vehicle ownership changed where houses were being built and the type that were being constructed. It ushered in fast food restaurants and motel franchises. In short, cars changed how society operated and these changes were felt by everyone, not just those behind the wheel.

Consider how the smartphone has impacted society more recently. Impacting everyone, not just those who carried it around in their pocket. Today those initial changes are hard to see because 85% of us have a smartphone, but long before we got to this point, smartphones were already having a transformative effect on us all.

Even if you’re not using AI-powered tools or services, or you think your job is “secure,” the secondary and tertiary effects of AI will touch every aspect of our lives. From the way goods are produced and distributed to the way we communicate and make decisions, AI’s influence will be pervasive. The ripple effect will be felt by everyone, directly and indirectly.

In a world intertwined with AI, we need to understand that the future is interconnected, and AI will be a significant thread weaving through it all.

#AI #FutureOfWork #DigitalTransformation

In the bustling world of automotive innovation, the Toyota Research Institute (TRI) is working to seamlessly integrate precise engineering constraints into a generative AI-augmented design process.

Traditionally, vehicle designers have leaned on publicly accessible text-to-image generative AI tools during the early stages of their creative journey. However, Toyota’s approach allows these professionals to infuse their initial sketches and engineering prerequisites directly into this process. This leads to a significant reduction in iterations required to harmonize design aesthetics with engineering necessities.

The Future of Vehicle Design

With this innovative solution, designers can now harness AI to solicit a diverse range of designs, all rooted in an initial prototype sketch. They can seamlessly infuse specific stylistic attributes, such as “sleek” or “modern,” while simultaneously optimizing key performance metrics. Charlene Wu, a leading figure at TRI’s Human-Centered AI (HCAI) Division, encapsulates the vision perfectly: “At TRI, we’re not just integrating AI into design; we’re redefining the very essence of vehicle design by intertwining human expertise with AI’s transformative power.”

The future of vehicle design is here, and it’s set to be more integrated, efficient, and innovative than ever before.

From Hype to High Impact

In the dynamic realm of technology, the temptation to incorporate the latest innovations can sometimes eclipse the foundational principles of an organization. One startup’s experience with AI integration stands as a case in point.

The Temptation of the Novel

Like numerous startups, Gigasheet, a web-based, no-code big data spreadsheet tool, was entranced by the advancements in AI. The potential appeared boundless, and they were keen to weave AI technology into their offering. Their initial venture was a summarization feature that employed GPT to craft brief descriptions for files uploaded by their users. While it seemed groundbreaking in theory, the practical outcome told a different story. Despite promoting the feature, it saw a mere 0.5% adoption rate among returning users within the initial month. Their key metrics remained unchanged, revealing that while the feature was innovative, it didn’t resonate with the core value proposition of their product: in-depth data analysis and extracting significant insights.

Realigning with Core Values

Their initial misstep served as a wake-up call for the importance of aligning product enhancements with user necessities. The company recognized the need to transition their focus from merely embedding AI to harnessing it in a manner that provided tangible benefits to their users.

The Lesson Learned

Their journey underscores the significance of harmonizing technological breakthroughs with user requirements. While it’s enticing to embark on the latest tech trend, it’s paramount to inquire: “Does this enhance the user experience?”

While the allure of cutting-edge technology is compelling, it’s vital to remain anchored in the core tenets of the product or service you are delivering.


Audiobooks have become increasingly popular, with platforms like Spotify even creating dedicated spaces for them. But recording an audiobook is a challenging endeavor, even for seasoned voice actors. Enter the world of AI. Researchers from MIT and Microsoft are collaborating with Project Gutenberg, the world’s largest repository of open-license ebooks, to produce 5,000 AI-narrated audiobooks. These include classics like “Pride and Prejudice” and “Alice’s Adventures in Wonderland.”

Mark Hamilton, a lead researcher from MIT, shared, “We wanted to create a massive amount of free audiobooks for the community.” Trained on millions of human speech examples, it can mimic various voices, accents, and even languages. Remarkably, it can produce custom voices from just five seconds of audio.

However, challenges persist. Project Gutenberg ebooks, crafted by volunteers, often have inconsistencies. The ultimate goal? Expand the AI-narrated collection to all 60,000 books on Project Gutenberg and possibly translate them.

Currently, these AI-voiced audiobooks are available for free streaming on platforms like Spotify and Apple Podcasts. The technology’s potential is vast, from reading plays with distinct character voices to creating personalized audiobook gifts. Imagine being able to eventually personalize and customize the voice you have read to you.

Visa unveiled a $100 million venture fund dedicated to generative AI startups, marking its entry into a rapidly growing sector that has attracted numerous investors this year. “While much of generative AI so far has been focused on tasks and content creation, this technology… will also meaningfully change commerce in ways we need to understand,” noted Jack Forestell, Visa’s Chief Product and Strategy Officer. The move underscores Visa’s commitment to staying at the forefront of technological advancements and its belief in the transformative potential of generative AI in the commerce landscape.

“Generative AI will revolutionize client interactions, bring new efficiencies to advisor practices, and ultimately help free up time to do what you do best: serve your clients.” -Morgan Stanley co-President Andy Saperstein

Morgan Stanley is taking its AI @ Morgan Stanley Assistant fully live for all financial advisors. The tool provides financial advisors rapid entry into the bank’s knowledge repository, which houses about 100,000 research reports and related documents.

Quick, easy access is a first-order effect. The second-order effects will be much larger. But these will take time to materialize. And they will only materialize after advisors not only incorporate the new tools into their workflows, but also build new processes and procedures as a result of the new tools.

Some food for thought on the future of fast food in the era of AI:

➡Personalized Experiences : AI algorithms are delving deep into customer preferences and buying patterns to offer personalized recommendations, enhancing the customer experience and boosting sales.

➡Automated Ordering and Delivery : From chatbots taking your orders to robots delivering your cheeseburgers, AI is revolutionizing the way we order and receive our food.

➡ Sustainability : AI is aiding fast-food chains in becoming more sustainable by optimizing supply chains, reducing energy consumption, and helping with food waste, moving every cheeseburger a step towards a greener planet.

➡ Fan Experience: The goal of AI is to improve the fan experience. Restaurants everywhere are starting to us AI to deliver better service and a better product.

Here’s to the future of fast food, where every day is #NationalCheeseburgerDay!