← Investors & VCs

Authority Sources

3 conversational turns exploring this topic

Turn 1

Q: Who are the authority sources on AI investment?

Subject: Forget the Gurus: Where the Real AI Investment Authority Lies

Hey Ins7ghts Crew,

Thinking about AI investment? Stop reading those breathless VC blog posts. The real authorities aren't hyping the next shiny object; they're the ones quietly building the infrastructure that makes AI possible. This week's data screams that truth.

Let's be blunt: the "thought leaders" are mostly noise. The signal is in the adoption and growth of foundational technologies.

Here's the breakdown:

Authority (This Week)TypeWhy It Matters
TableauPerson? Product?Hold on... Tableau as an authority on AI investment? Yes, indirectly. The surge in Tableau mentions (+226%) signals a desperate need to understand the data AI is generating. Investment follows insight. If you can't visualize and interpret the output, you're flying blind. This is about democratizing AI understanding, not just building models.
ClaudePersonClaude is the AI chatbot from Anthropic. While Sam Altman gets all the headlines, the increased authority of Claude (+62%) suggests a growing recognition of alternative large language models. Smart money isn't just betting on OpenAI. It's diversifying.
ExcelProductDon't laugh. The +355% growth in Excel mentions proves that most companies are still trying to figure out AI with the tools they already have. This isn't about replacing Excel with fancy AI platforms; it's about augmenting it. The real investment opportunity is in making AI accessible to the masses, not just the data scientists.
KafkaProductKafka's +101% growth points to the need for real-time data pipelines to feed AI models. It's the plumbing that makes the whole system work. Investment in data infrastructure is less glamorous than model development, but it's absolutely essential.
Apache Spark & IcebergProductThese are the unsung heroes of data processing and storage. Spark (+207%) handles the massive datasets needed for training, while Iceberg (+95%) provides a reliable and scalable data lake format. Again, infrastructure, infrastructure, infrastructure.

My Hot Take: We're entering the "AI plumbing" era. The hype cycle is peaking. The real winners will be the companies building the reliable, scalable, and accessible infrastructure that allows everyone to leverage AI, not just the elite few.

Advice for You:

  1. Look beyond the headlines. Don't be swayed by the latest AI unicorn. Focus on the underlying technologies that are enabling AI adoption.
  2. Invest in data literacy. Train your teams to understand and interpret AI-generated data. Tableau (or similar tools) should be mandatory.
  3. Prioritize infrastructure. Ensure you have the data pipelines, storage, and processing power to support your AI initiatives. Kafka, Spark, and Iceberg are good places to start.
  4. Don't underestimate the power of existing tools. Find ways to integrate AI into your existing workflows, rather than trying to replace them entirely. Excel is your friend (for now).

Until next time, keep digging.

Yves Mulkers
The Ins7ghts Newsletter

Turn 2

Q: What organizations do they represent?

Subject: Iceberg vs. Arrow: The Cold War Heating Up in Data Land

Forget AI hype for a minute. The real battle brewing isn't about models, it's about data. Specifically, how we store, access, and move it. This week's data deluge reveals a clear, albeit subtle, power struggle between two Apache behemoths: Iceberg and Arrow.

The affiliations are telling. While the articles mention individuals contributing to both projects, the context is crucial. We see Anurag Mantripragada, Anton Okolnychyi, Kevin Liu, and Russell Spitzer all tied to Apache Iceberg. Antoine Pitrou gets a nod for his work on Apache Arrow. But this isn't about individuals; it's about ecosystems.

Here's the provocative insight: Iceberg is positioning itself as the future of the data lake, while Arrow is quietly becoming the lingua franca for in-memory data processing. They're not directly competing yet, but their trajectories are on a collision course.

Why is this important? Because the winner (or the alliance that forms) will dictate the architecture of data platforms for the next decade. Think about it:

FeatureApache IcebergApache Arrow
FocusData Lake Table FormatIn-Memory Data Representation & Processing
Key BenefitACID transactions, schema evolution, time travelZero-copy data sharing, vectorized computation
Use CasesData warehousing, data lake analyticsReal-time analytics, machine learning pipelines
Potential RiskComplexity in implementationEcosystem fragmentation

The noise? Endless articles about "modern data stacks" that fail to acknowledge this fundamental tension. Companies are blindly adopting both technologies without understanding the long-term implications. They're building systems that might work today, but could become architectural nightmares tomorrow.

My hot take? Iceberg needs to embrace Arrow more aggressively. Arrow's columnar memory format is a natural fit for Iceberg's data lake ambitions. Imagine querying petabytes of data in your Iceberg lake with Arrow's blazing-fast in-memory processing. That's the future. If Iceberg doesn't fully integrate with Arrow, it risks becoming just another table format in a crowded field.

Specific Advice:

  1. If you're building a data lake, don't just pick a table format. Understand the implications for your entire data ecosystem. Evaluate how well it integrates with your existing tools and how it will support future use cases.
  2. Invest in understanding Apache Arrow's Flight RPC framework. It's the key to unlocking zero-copy data transfer between systems and will be crucial for building high-performance data pipelines.
  3. Follow the work of the Apache Iceberg and Arrow communities closely. Pay attention to their roadmap and how they are addressing the challenges of interoperability.

Don't be a passive observer in this data cold war. Choose your side (or, better yet, help forge an alliance) and build a data platform that's ready for the future.

Turn 3

Q: What are they saying? Get quotes.

Subject: Iceberg vs. Arrow: The Cold War Heating Up in Data Land (Part 2)

Forget AI hype for a minute. The real battle brewing isn't about models, it's about data. Specifically, how we store, access, and monetize it. Last week, we talked about ignoring the VC cheerleaders and focusing on the operators actually deploying AI. This week, let's dive into what they're saying – and what that really means.

The narrative being spun is all about "AI computing powers changing the stock market." (Source: Investing.com). We're seeing headlines touting portfolio performance, like "Year to date, 2 out of 3 global portfolios are beating their benchmark indexes, with 88% in the green." (Source: Investing.com). And the examples are always the same: "Our flagship Tech Titans strategy doubled the S&P 500 within 18 months, including notable winners like Super Micro Computer (+185%) and AppLovin (+157%)." (Source: Investing.com).

Sounds great, right? Wrong.

This is the arrow – the visible, sexy part of the AI investment story. Everyone's chasing the next Super Micro Computer. But underneath the surface, there's a massive iceberg of infrastructure, data management, and regulatory hurdles that will determine who really wins.

Think about it: Super Micro Computer's success is predicated on the demand for high-performance computing driven by AI. But who's building the data centers to house those servers? Who's providing the clean, labeled data to train those models? And who's navigating the increasingly complex data privacy regulations that govern its use?

Here's where the real money is being made, and where the long-term competitive advantage lies:

CategoryPlayers to WatchWhy They Matter
Data InfrastructureSnowflake, Databricks, ConfluentThese companies are building the foundational layer for data storage, processing, and streaming. They're not just selling tools; they're selling control over the data supply chain. The rise of data mesh architectures is directly tied to their success.
Data Labeling & AnnotationScale AI, Labelbox, Amazon SageMaker Ground TruthModels are only as good as the data they're trained on. These companies are providing the services and platforms to ensure that data is accurate, consistent, and ready for AI consumption. Expect to see consolidation in this space as the need for specialized domain expertise grows.
Data Governance & PrivacyOneTrust, Collibra, BigIDAs regulations like GDPR and CCPA become more stringent, companies need tools to manage data privacy, consent, and compliance. These companies are helping organizations navigate the complex legal landscape and avoid costly fines. They are also becoming critical for building trust with consumers.

My Hot Take: The "AI computing powers" narrative is a distraction. It's shiny and exciting, but it obscures the fundamental challenges of data management. The companies that control the data will ultimately control the AI revolution. We're already seeing a shift from model-centric AI to data-centric AI, and this trend will only accelerate.

Your Action Items:

  1. Audit your data supply chain: Understand where your data comes from, how it's processed, and who has access to it.
  2. Invest in data governance: Implement policies and tools to ensure data quality, privacy, and compliance.
  3. Focus on data literacy: Train your team to understand the value of data and how to use it effectively.

Stop chasing the arrow. Start building the iceberg. Your future depends on it.

Until next week,

Yves Mulkers
The Ins7ghts Newsletter