In January 2026, Microsoft announced the general availability of Azure Databricks Agent Bricks—a native capability for creating, deploying, and managing AI agents directly within the Databricks platform. This integration unifies data engineering, machine learning, and agentic AI development in a single environment, enabling data teams to build intelligent agents that have native access to lakehouse […]
Read more →Category: Data Engineering
Microsoft Acquires Osmos: Agentic Data Engineering Comes to Microsoft Fabric
In January 2026, Microsoft announced the acquisition of Osmos, an agentic AI data engineering platform that automates complex data transformation, integration, and quality tasks. This acquisition signals Microsoft’s commitment to bringing autonomous AI agents into the data engineering workflow within Microsoft Fabric. For data engineers struggling with repetitive ETL development, schema mapping, and data quality […]
Read more →EF Core 10: Vector Search, LeftJoin/RightJoin, and Full-Text Search on Cosmos DB
Entity Framework Core 10, released alongside .NET 10, introduces features that position it as a first-class choice for AI-powered applications. The headline addition—vector search support—enables semantic similarity queries directly in LINQ, while new LeftJoin/RightJoin operators and Cosmos DB full-text search round out a release focused on modern data access patterns. This comprehensive guide explores each […]
Read more →Semantic Search in Production: Embedding Strategies for Enterprise RAG
The quality of your RAG (Retrieval-Augmented Generation) system depends more on your embedding strategy than on your choice of LLM. Poor embeddings mean irrelevant context retrieval, which no amount of prompt engineering can fix. This comprehensive guide explores production-ready embedding strategies—covering model selection, chunking approaches, hybrid search techniques, and optimization patterns that directly impact retrieval […]
Read more →Data Quality for AI: Ensuring High-Quality Training Data
Data quality determines AI model performance. After managing data quality for 100+ AI projects, I’ve learned what matters. Here’s the complete guide to ensuring high-quality training data. Figure 1: Data Quality Framework Why Data Quality Matters Data quality directly impacts model performance: Accuracy: Poor data leads to poor predictions Bias: Biased data creates biased models […]
Read more →Real-Time Data Streaming with Apache Kafka: Building Production Event Pipelines in Python
Introduction: Real-time data streaming has become essential for modern data architectures, enabling immediate insights and actions on data as it arrives. This comprehensive guide explores production streaming patterns using Apache Kafka and Python, covering producer/consumer design, stream processing with Flink, exactly-once semantics, and operational best practices. After building streaming platforms processing billions of events daily, […]
Read more →