Categories

Archives

A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.

Azure IoT Hub Device Management–Released to Public

Today Microsoft has announced general availability of Azure IoT Hub Device Management. With this release Azure IoT Hub subscribers/customers will be able to get access to following features and functionalities:

  • Device twin. Use a digital representation of your physical devices to synchronize device conditions and operator configuration between the cloud and device.
  • Direct methods. Apply a direct, performant action on a connected device through the cloud.
  • Jobs. Broadcast and schedule device twin changes and methods to scale management operations across millions of devices.
  • Queries. Create real-time, dynamic reports across device twins and jobs to attest status and health for entire device collections, whether your devices are online or offline.

Useful References:

Microsoft Visual Studio 2015 Update 3 (KB3165756) – Cumulative Servicing Release – 14.0. 25431.01

As per Microsoft ” This cumulative servicing release provides fixes to Microsoft Visual Studio 2015 Update 3. These fixes address high-impact bugs that were either found by the product team or reported by the community.

Download the latest from: here , and you can fine the detailed list of fixes on the same site.

Embedding Models Deep Dive: From Sentence Transformers to Production Deployment

Introduction: Embeddings are the foundation of modern AI applications—they transform text, images, and other data into dense vectors that capture semantic meaning. Understanding how embedding models work, their strengths and limitations, and how to choose between them is essential for building effective search, RAG, and similarity systems. This guide covers the landscape of embedding models: from sentence transformers to OpenAI’s text-embedding-ada-002, from BERT-based models to instruction-tuned embedders. We’ll explore how to evaluate embeddings for your use case, optimize for latency and cost, fine-tune for domain-specific tasks, and deploy embedding services at scale. Whether you’re building semantic search, document clustering, or retrieval-augmented generation, these patterns will help you choose and use embeddings effectively.

Embedding Models Deep Dive
Embedding Pipeline: Tokenization, Encoder Model, Pooling

Embedding Model Fundamentals

from dataclasses import dataclass, field
from typing import Any, Optional, Union
from abc import ABC, abstractmethod
import numpy as np

@dataclass
class EmbeddingResult:
    """Result of embedding operation."""
    
    vector: np.ndarray
    model: str
    dimensions: int
    tokens_used: int = 0
    metadata: dict = field(default_factory=dict)

class EmbeddingModel(ABC):
    """Abstract embedding model."""
    
    @property
    @abstractmethod
    def dimensions(self) -> int:
        """Get embedding dimensions."""
        pass
    
    @abstractmethod
    def embed(self, text: str) -> EmbeddingResult:
        """Embed single text."""
        pass
    
    @abstractmethod
    def embed_batch(self, texts: list[str]) -> list[EmbeddingResult]:
        """Embed batch of texts."""
        pass

class OpenAIEmbedding(EmbeddingModel):
    """OpenAI embedding models."""
    
    MODEL_DIMENSIONS = {
        "text-embedding-ada-002": 1536,
        "text-embedding-3-small": 1536,
        "text-embedding-3-large": 3072
    }
    
    def __init__(
        self,
        api_key: str,
        model: str = "text-embedding-3-small"
    ):
        from openai import OpenAI
        self.client = OpenAI(api_key=api_key)
        self.model = model
    
    @property
    def dimensions(self) -> int:
        return self.MODEL_DIMENSIONS.get(self.model, 1536)
    
    def embed(self, text: str) -> EmbeddingResult:
        """Embed with OpenAI."""
        
        response = self.client.embeddings.create(
            model=self.model,
            input=text
        )
        
        return EmbeddingResult(
            vector=np.array(response.data[0].embedding),
            model=self.model,
            dimensions=self.dimensions,
            tokens_used=response.usage.total_tokens
        )
    
    def embed_batch(self, texts: list[str]) -> list[EmbeddingResult]:
        """Batch embed with OpenAI."""
        
        response = self.client.embeddings.create(
            model=self.model,
            input=texts
        )
        
        results = []
        for data in response.data:
            results.append(EmbeddingResult(
                vector=np.array(data.embedding),
                model=self.model,
                dimensions=self.dimensions,
                tokens_used=response.usage.total_tokens // len(texts)
            ))
        
        return results

class SentenceTransformerEmbedding(EmbeddingModel):
    """Sentence Transformers models."""
    
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        from sentence_transformers import SentenceTransformer
        self.model = SentenceTransformer(model_name)
        self.model_name = model_name
    
    @property
    def dimensions(self) -> int:
        return self.model.get_sentence_embedding_dimension()
    
    def embed(self, text: str) -> EmbeddingResult:
        """Embed with Sentence Transformers."""
        
        vector = self.model.encode(text, convert_to_numpy=True)
        
        return EmbeddingResult(
            vector=vector,
            model=self.model_name,
            dimensions=self.dimensions
        )
    
    def embed_batch(self, texts: list[str]) -> list[EmbeddingResult]:
        """Batch embed."""
        
        vectors = self.model.encode(texts, convert_to_numpy=True)
        
        return [
            EmbeddingResult(
                vector=v,
                model=self.model_name,
                dimensions=self.dimensions
            )
            for v in vectors
        ]

class CohereEmbedding(EmbeddingModel):
    """Cohere embedding models."""
    
    def __init__(
        self,
        api_key: str,
        model: str = "embed-english-v3.0"
    ):
        import cohere
        self.client = cohere.Client(api_key)
        self.model = model
        self._dimensions = 1024  # Default for v3
    
    @property
    def dimensions(self) -> int:
        return self._dimensions
    
    def embed(self, text: str) -> EmbeddingResult:
        """Embed with Cohere."""
        
        response = self.client.embed(
            texts=[text],
            model=self.model,
            input_type="search_document"
        )
        
        return EmbeddingResult(
            vector=np.array(response.embeddings[0]),
            model=self.model,
            dimensions=self.dimensions
        )
    
    def embed_batch(self, texts: list[str]) -> list[EmbeddingResult]:
        """Batch embed with Cohere."""
        
        response = self.client.embed(
            texts=texts,
            model=self.model,
            input_type="search_document"
        )
        
        return [
            EmbeddingResult(
                vector=np.array(emb),
                model=self.model,
                dimensions=self.dimensions
            )
            for emb in response.embeddings
        ]

class VoyageEmbedding(EmbeddingModel):
    """Voyage AI embeddings."""
    
    def __init__(
        self,
        api_key: str,
        model: str = "voyage-2"
    ):
        import voyageai
        self.client = voyageai.Client(api_key=api_key)
        self.model = model
        self._dimensions = 1024
    
    @property
    def dimensions(self) -> int:
        return self._dimensions
    
    def embed(self, text: str) -> EmbeddingResult:
        """Embed with Voyage."""
        
        result = self.client.embed(
            [text],
            model=self.model,
            input_type="document"
        )
        
        return EmbeddingResult(
            vector=np.array(result.embeddings[0]),
            model=self.model,
            dimensions=self.dimensions
        )
    
    def embed_batch(self, texts: list[str]) -> list[EmbeddingResult]:
        """Batch embed with Voyage."""
        
        result = self.client.embed(
            texts,
            model=self.model,
            input_type="document"
        )
        
        return [
            EmbeddingResult(
                vector=np.array(emb),
                model=self.model,
                dimensions=self.dimensions
            )
            for emb in result.embeddings
        ]

Instruction-Tuned Embeddings

from dataclasses import dataclass
from typing import Any, Optional

class InstructorEmbedding(EmbeddingModel):
    """Instructor embedding model with task instructions."""
    
    def __init__(self, model_name: str = "hkunlp/instructor-large"):
        from InstructorEmbedding import INSTRUCTOR
        self.model = INSTRUCTOR(model_name)
        self.model_name = model_name
    
    @property
    def dimensions(self) -> int:
        return 768  # instructor-large
    
    def embed(
        self,
        text: str,
        instruction: str = "Represent the document for retrieval:"
    ) -> EmbeddingResult:
        """Embed with instruction."""
        
        vector = self.model.encode([[instruction, text]])[0]
        
        return EmbeddingResult(
            vector=vector,
            model=self.model_name,
            dimensions=self.dimensions,
            metadata={"instruction": instruction}
        )
    
    def embed_batch(
        self,
        texts: list[str],
        instruction: str = "Represent the document for retrieval:"
    ) -> list[EmbeddingResult]:
        """Batch embed with instruction."""
        
        inputs = [[instruction, text] for text in texts]
        vectors = self.model.encode(inputs)
        
        return [
            EmbeddingResult(
                vector=v,
                model=self.model_name,
                dimensions=self.dimensions,
                metadata={"instruction": instruction}
            )
            for v in vectors
        ]
    
    def embed_query(self, query: str) -> EmbeddingResult:
        """Embed query with query instruction."""
        
        return self.embed(
            query,
            instruction="Represent the question for retrieving relevant documents:"
        )
    
    def embed_document(self, document: str) -> EmbeddingResult:
        """Embed document with document instruction."""
        
        return self.embed(
            document,
            instruction="Represent the document for retrieval:"
        )

class E5Embedding(EmbeddingModel):
    """E5 embedding models."""
    
    def __init__(self, model_name: str = "intfloat/e5-large-v2"):
        from sentence_transformers import SentenceTransformer
        self.model = SentenceTransformer(model_name)
        self.model_name = model_name
    
    @property
    def dimensions(self) -> int:
        return self.model.get_sentence_embedding_dimension()
    
    def embed(self, text: str, prefix: str = "passage: ") -> EmbeddingResult:
        """Embed with E5 prefix."""
        
        # E5 requires specific prefixes
        prefixed_text = f"{prefix}{text}"
        vector = self.model.encode(prefixed_text, convert_to_numpy=True)
        
        return EmbeddingResult(
            vector=vector,
            model=self.model_name,
            dimensions=self.dimensions,
            metadata={"prefix": prefix}
        )
    
    def embed_batch(
        self,
        texts: list[str],
        prefix: str = "passage: "
    ) -> list[EmbeddingResult]:
        """Batch embed with E5."""
        
        prefixed = [f"{prefix}{t}" for t in texts]
        vectors = self.model.encode(prefixed, convert_to_numpy=True)
        
        return [
            EmbeddingResult(
                vector=v,
                model=self.model_name,
                dimensions=self.dimensions,
                metadata={"prefix": prefix}
            )
            for v in vectors
        ]
    
    def embed_query(self, query: str) -> EmbeddingResult:
        """Embed query with query prefix."""
        return self.embed(query, prefix="query: ")
    
    def embed_document(self, document: str) -> EmbeddingResult:
        """Embed document with passage prefix."""
        return self.embed(document, prefix="passage: ")

class BGEEmbedding(EmbeddingModel):
    """BGE (BAAI General Embedding) models."""
    
    def __init__(self, model_name: str = "BAAI/bge-large-en-v1.5"):
        from sentence_transformers import SentenceTransformer
        self.model = SentenceTransformer(model_name)
        self.model_name = model_name
    
    @property
    def dimensions(self) -> int:
        return self.model.get_sentence_embedding_dimension()
    
    def embed(self, text: str) -> EmbeddingResult:
        """Embed with BGE."""
        
        vector = self.model.encode(text, normalize_embeddings=True)
        
        return EmbeddingResult(
            vector=vector,
            model=self.model_name,
            dimensions=self.dimensions
        )
    
    def embed_batch(self, texts: list[str]) -> list[EmbeddingResult]:
        """Batch embed with BGE."""
        
        vectors = self.model.encode(texts, normalize_embeddings=True)
        
        return [
            EmbeddingResult(
                vector=v,
                model=self.model_name,
                dimensions=self.dimensions
            )
            for v in vectors
        ]
    
    def embed_query(self, query: str) -> EmbeddingResult:
        """Embed query with instruction prefix."""
        
        # BGE uses instruction prefix for queries
        instruction = "Represent this sentence for searching relevant passages: "
        prefixed = f"{instruction}{query}"
        
        vector = self.model.encode(prefixed, normalize_embeddings=True)
        
        return EmbeddingResult(
            vector=vector,
            model=self.model_name,
            dimensions=self.dimensions,
            metadata={"type": "query"}
        )

Embedding Evaluation

from dataclasses import dataclass, field
from typing import Any, Optional
import numpy as np

@dataclass
class EvaluationResult:
    """Embedding evaluation result."""
    
    metric: str
    score: float
    details: dict = field(default_factory=dict)

class EmbeddingEvaluator:
    """Evaluate embedding quality."""
    
    def __init__(self, model: EmbeddingModel):
        self.model = model
    
    def evaluate_similarity(
        self,
        pairs: list[tuple[str, str]],
        labels: list[float]
    ) -> EvaluationResult:
        """Evaluate on similarity task."""
        
        from scipy.stats import spearmanr
        
        predictions = []
        
        for text1, text2 in pairs:
            emb1 = self.model.embed(text1).vector
            emb2 = self.model.embed(text2).vector
            
            # Cosine similarity
            sim = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))
            predictions.append(sim)
        
        # Spearman correlation
        correlation, p_value = spearmanr(predictions, labels)
        
        return EvaluationResult(
            metric="spearman_correlation",
            score=correlation,
            details={"p_value": p_value}
        )
    
    def evaluate_retrieval(
        self,
        queries: list[str],
        documents: list[str],
        relevance: dict[int, list[int]]  # query_idx -> relevant_doc_indices
    ) -> EvaluationResult:
        """Evaluate retrieval performance."""
        
        # Embed all documents
        doc_embeddings = [self.model.embed(d).vector for d in documents]
        doc_matrix = np.array(doc_embeddings)
        
        # Evaluate each query
        mrr_scores = []
        recall_at_k = {1: [], 5: [], 10: []}
        
        for q_idx, query in enumerate(queries):
            query_emb = self.model.embed(query).vector
            
            # Calculate similarities
            similarities = np.dot(doc_matrix, query_emb)
            ranked_indices = np.argsort(similarities)[::-1]
            
            relevant = set(relevance.get(q_idx, []))
            
            # MRR
            for rank, doc_idx in enumerate(ranked_indices):
                if doc_idx in relevant:
                    mrr_scores.append(1.0 / (rank + 1))
                    break
            else:
                mrr_scores.append(0.0)
            
            # Recall@K
            for k in recall_at_k:
                top_k = set(ranked_indices[:k])
                recall = len(top_k & relevant) / len(relevant) if relevant else 0
                recall_at_k[k].append(recall)
        
        return EvaluationResult(
            metric="retrieval",
            score=np.mean(mrr_scores),
            details={
                "mrr": np.mean(mrr_scores),
                "recall@1": np.mean(recall_at_k[1]),
                "recall@5": np.mean(recall_at_k[5]),
                "recall@10": np.mean(recall_at_k[10])
            }
        )
    
    def evaluate_clustering(
        self,
        texts: list[str],
        labels: list[int]
    ) -> EvaluationResult:
        """Evaluate clustering quality."""
        
        from sklearn.metrics import silhouette_score, adjusted_rand_score
        from sklearn.cluster import KMeans
        
        # Get embeddings
        embeddings = np.array([self.model.embed(t).vector for t in texts])
        
        # Cluster
        n_clusters = len(set(labels))
        kmeans = KMeans(n_clusters=n_clusters, random_state=42)
        predicted = kmeans.fit_predict(embeddings)
        
        # Evaluate
        silhouette = silhouette_score(embeddings, predicted)
        ari = adjusted_rand_score(labels, predicted)
        
        return EvaluationResult(
            metric="clustering",
            score=ari,
            details={
                "adjusted_rand_index": ari,
                "silhouette_score": silhouette
            }
        )

class EmbeddingBenchmark:
    """Benchmark multiple embedding models."""
    
    def __init__(self, models: dict[str, EmbeddingModel]):
        self.models = models
    
    def benchmark_latency(
        self,
        texts: list[str],
        batch_sizes: list[int] = [1, 8, 32]
    ) -> dict[str, dict]:
        """Benchmark embedding latency."""
        
        import time
        
        results = {}
        
        for name, model in self.models.items():
            results[name] = {}
            
            for batch_size in batch_sizes:
                times = []
                
                for i in range(0, len(texts), batch_size):
                    batch = texts[i:i + batch_size]
                    
                    start = time.time()
                    model.embed_batch(batch)
                    elapsed = time.time() - start
                    
                    times.append(elapsed / len(batch))
                
                results[name][f"batch_{batch_size}"] = {
                    "avg_ms": np.mean(times) * 1000,
                    "p99_ms": np.percentile(times, 99) * 1000
                }
        
        return results
    
    def benchmark_quality(
        self,
        similarity_data: tuple[list, list] = None,
        retrieval_data: tuple[list, list, dict] = None
    ) -> dict[str, dict]:
        """Benchmark embedding quality."""
        
        results = {}
        
        for name, model in self.models.items():
            evaluator = EmbeddingEvaluator(model)
            results[name] = {}
            
            if similarity_data:
                pairs, labels = similarity_data
                sim_result = evaluator.evaluate_similarity(pairs, labels)
                results[name]["similarity"] = sim_result.score
            
            if retrieval_data:
                queries, docs, relevance = retrieval_data
                ret_result = evaluator.evaluate_retrieval(queries, docs, relevance)
                results[name]["retrieval"] = ret_result.details
        
        return results
    
    def compare_dimensions(self) -> dict[str, int]:
        """Compare model dimensions."""
        
        return {name: model.dimensions for name, model in self.models.items()}

Embedding Optimization

from dataclasses import dataclass
from typing import Any, Optional
import numpy as np

class EmbeddingCache:
    """Cache embeddings to avoid recomputation."""
    
    def __init__(self, model: EmbeddingModel, cache_size: int = 10000):
        self.model = model
        self.cache_size = cache_size
        self.cache: dict[str, np.ndarray] = {}
        self.access_order: list[str] = []
    
    def embed(self, text: str) -> EmbeddingResult:
        """Embed with caching."""
        
        cache_key = self._hash(text)
        
        if cache_key in self.cache:
            # Update access order
            self.access_order.remove(cache_key)
            self.access_order.append(cache_key)
            
            return EmbeddingResult(
                vector=self.cache[cache_key],
                model=self.model.model_name if hasattr(self.model, 'model_name') else "cached",
                dimensions=len(self.cache[cache_key]),
                metadata={"cached": True}
            )
        
        # Compute embedding
        result = self.model.embed(text)
        
        # Cache result
        self._add_to_cache(cache_key, result.vector)
        
        return result
    
    def _hash(self, text: str) -> str:
        """Hash text for cache key."""
        import hashlib
        return hashlib.md5(text.encode()).hexdigest()
    
    def _add_to_cache(self, key: str, vector: np.ndarray):
        """Add to cache with LRU eviction."""
        
        if len(self.cache) >= self.cache_size:
            # Evict oldest
            oldest = self.access_order.pop(0)
            del self.cache[oldest]
        
        self.cache[key] = vector
        self.access_order.append(key)

class DimensionReducer:
    """Reduce embedding dimensions."""
    
    def __init__(self, target_dim: int = 256, method: str = "pca"):
        self.target_dim = target_dim
        self.method = method
        self.reducer = None
    
    def fit(self, embeddings: np.ndarray):
        """Fit reducer on embeddings."""
        
        if self.method == "pca":
            from sklearn.decomposition import PCA
            self.reducer = PCA(n_components=self.target_dim)
        elif self.method == "umap":
            import umap
            self.reducer = umap.UMAP(n_components=self.target_dim)
        elif self.method == "random":
            # Random projection
            from sklearn.random_projection import GaussianRandomProjection
            self.reducer = GaussianRandomProjection(n_components=self.target_dim)
        
        self.reducer.fit(embeddings)
    
    def transform(self, embedding: np.ndarray) -> np.ndarray:
        """Reduce embedding dimensions."""
        
        if self.reducer is None:
            raise ValueError("Reducer not fitted")
        
        if embedding.ndim == 1:
            embedding = embedding.reshape(1, -1)
        
        reduced = self.reducer.transform(embedding)
        
        return reduced[0] if reduced.shape[0] == 1 else reduced

class QuantizedEmbedding:
    """Quantize embeddings for storage efficiency."""
    
    def __init__(self, bits: int = 8):
        self.bits = bits
        self.min_val = None
        self.max_val = None
    
    def fit(self, embeddings: np.ndarray):
        """Fit quantization parameters."""
        
        self.min_val = embeddings.min()
        self.max_val = embeddings.max()
    
    def quantize(self, embedding: np.ndarray) -> np.ndarray:
        """Quantize embedding."""
        
        if self.min_val is None:
            self.min_val = embedding.min()
            self.max_val = embedding.max()
        
        # Normalize to [0, 1]
        normalized = (embedding - self.min_val) / (self.max_val - self.min_val)
        
        # Quantize
        max_int = 2 ** self.bits - 1
        quantized = np.round(normalized * max_int).astype(np.uint8)
        
        return quantized
    
    def dequantize(self, quantized: np.ndarray) -> np.ndarray:
        """Dequantize embedding."""
        
        max_int = 2 ** self.bits - 1
        normalized = quantized.astype(np.float32) / max_int
        
        return normalized * (self.max_val - self.min_val) + self.min_val

class MatryoshkaEmbedding:
    """Use Matryoshka embeddings for flexible dimensions."""
    
    def __init__(self, model: EmbeddingModel):
        self.model = model
    
    def embed(self, text: str, dimensions: int = None) -> EmbeddingResult:
        """Embed with optional dimension truncation."""
        
        result = self.model.embed(text)
        
        if dimensions and dimensions < result.dimensions:
            # Truncate and renormalize
            truncated = result.vector[:dimensions]
            truncated = truncated / np.linalg.norm(truncated)
            
            return EmbeddingResult(
                vector=truncated,
                model=result.model,
                dimensions=dimensions,
                metadata={"truncated_from": result.dimensions}
            )
        
        return result

class BatchOptimizer:
    """Optimize batch embedding."""
    
    def __init__(self, model: EmbeddingModel, max_batch_size: int = 32):
        self.model = model
        self.max_batch_size = max_batch_size
    
    def embed_optimal(self, texts: list[str]) -> list[EmbeddingResult]:
        """Embed with optimal batching."""
        
        results = []
        
        # Sort by length for better batching
        indexed_texts = list(enumerate(texts))
        indexed_texts.sort(key=lambda x: len(x[1]))
        
        # Process in batches
        for i in range(0, len(indexed_texts), self.max_batch_size):
            batch = indexed_texts[i:i + self.max_batch_size]
            batch_texts = [t for _, t in batch]
            
            batch_results = self.model.embed_batch(batch_texts)
            
            for (orig_idx, _), result in zip(batch, batch_results):
                results.append((orig_idx, result))
        
        # Restore original order
        results.sort(key=lambda x: x[0])
        
        return [r for _, r in results]

Embedding Fine-Tuning

from dataclasses import dataclass, field
from typing import Any, Optional
import torch
from torch.utils.data import Dataset, DataLoader

@dataclass
class ContrastivePair:
    """A contrastive learning pair."""
    
    anchor: str
    positive: str
    negative: str = None

class ContrastiveDataset(Dataset):
    """Dataset for contrastive learning."""
    
    def __init__(self, pairs: list[ContrastivePair], tokenizer: Any):
        self.pairs = pairs
        self.tokenizer = tokenizer
    
    def __len__(self):
        return len(self.pairs)
    
    def __getitem__(self, idx):
        pair = self.pairs[idx]
        
        anchor = self.tokenizer(
            pair.anchor,
            truncation=True,
            max_length=512,
            return_tensors="pt"
        )
        
        positive = self.tokenizer(
            pair.positive,
            truncation=True,
            max_length=512,
            return_tensors="pt"
        )
        
        result = {
            "anchor_input_ids": anchor["input_ids"].squeeze(),
            "anchor_attention_mask": anchor["attention_mask"].squeeze(),
            "positive_input_ids": positive["input_ids"].squeeze(),
            "positive_attention_mask": positive["attention_mask"].squeeze()
        }
        
        if pair.negative:
            negative = self.tokenizer(
                pair.negative,
                truncation=True,
                max_length=512,
                return_tensors="pt"
            )
            result["negative_input_ids"] = negative["input_ids"].squeeze()
            result["negative_attention_mask"] = negative["attention_mask"].squeeze()
        
        return result

class EmbeddingFineTuner:
    """Fine-tune embedding models."""
    
    def __init__(
        self,
        model_name: str,
        output_dir: str,
        loss_type: str = "cosine"  # "cosine", "triplet", "contrastive"
    ):
        from sentence_transformers import SentenceTransformer
        
        self.model = SentenceTransformer(model_name)
        self.output_dir = output_dir
        self.loss_type = loss_type
    
    def train(
        self,
        train_pairs: list[ContrastivePair],
        eval_pairs: list[ContrastivePair] = None,
        epochs: int = 3,
        batch_size: int = 16,
        warmup_steps: int = 100
    ):
        """Train with contrastive learning."""
        
        from sentence_transformers import InputExample, losses
        from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator
        
        # Convert to InputExamples
        train_examples = []
        for pair in train_pairs:
            if pair.negative:
                train_examples.append(InputExample(
                    texts=[pair.anchor, pair.positive, pair.negative]
                ))
            else:
                train_examples.append(InputExample(
                    texts=[pair.anchor, pair.positive],
                    label=1.0
                ))
        
        # Create DataLoader
        train_dataloader = DataLoader(
            train_examples,
            shuffle=True,
            batch_size=batch_size
        )
        
        # Select loss
        if self.loss_type == "cosine":
            loss = losses.CosineSimilarityLoss(self.model)
        elif self.loss_type == "triplet":
            loss = losses.TripletLoss(self.model)
        elif self.loss_type == "contrastive":
            loss = losses.ContrastiveLoss(self.model)
        elif self.loss_type == "mnrl":
            loss = losses.MultipleNegativesRankingLoss(self.model)
        
        # Evaluator
        evaluator = None
        if eval_pairs:
            eval_sentences1 = [p.anchor for p in eval_pairs]
            eval_sentences2 = [p.positive for p in eval_pairs]
            eval_scores = [1.0] * len(eval_pairs)
            
            evaluator = EmbeddingSimilarityEvaluator(
                eval_sentences1,
                eval_sentences2,
                eval_scores
            )
        
        # Train
        self.model.fit(
            train_objectives=[(train_dataloader, loss)],
            epochs=epochs,
            warmup_steps=warmup_steps,
            evaluator=evaluator,
            output_path=self.output_dir
        )
    
    def train_with_hard_negatives(
        self,
        queries: list[str],
        positives: list[str],
        corpus: list[str],
        epochs: int = 3
    ):
        """Train with mined hard negatives."""
        
        from sentence_transformers import losses
        from sentence_transformers.util import mine_hard_negatives
        
        # Mine hard negatives
        pairs_with_negatives = []
        
        for query, positive in zip(queries, positives):
            # Find hard negatives from corpus
            query_emb = self.model.encode(query)
            corpus_embs = self.model.encode(corpus)
            
            # Get most similar that aren't the positive
            similarities = np.dot(corpus_embs, query_emb)
            sorted_indices = np.argsort(similarities)[::-1]
            
            for idx in sorted_indices[:5]:
                if corpus[idx] != positive:
                    pairs_with_negatives.append(ContrastivePair(
                        anchor=query,
                        positive=positive,
                        negative=corpus[idx]
                    ))
                    break
        
        # Train with triplet loss
        self.loss_type = "triplet"
        self.train(pairs_with_negatives, epochs=epochs)

Production Embedding Service

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional, Any
import numpy as np

app = FastAPI()

class EmbedRequest(BaseModel):
    texts: list[str]
    model: str = "default"
    dimensions: Optional[int] = None

class EmbedResponse(BaseModel):
    embeddings: list[list[float]]
    model: str
    dimensions: int
    usage: dict

class SimilarityRequest(BaseModel):
    text1: str
    text2: str
    model: str = "default"

# Initialize models
class MockEmbedder:
    def embed_batch(self, texts):
        return [type('obj', (object,), {'vector': np.random.randn(384)})() for _ in texts]
    
    @property
    def dimensions(self):
        return 384

models = {
    "default": MockEmbedder()
}

@app.post("/v1/embeddings")
async def create_embeddings(request: EmbedRequest) -> EmbedResponse:
    """Create embeddings for texts."""
    
    if request.model not in models:
        raise HTTPException(status_code=400, detail=f"Unknown model: {request.model}")
    
    model = models[request.model]
    
    # Embed
    results = model.embed_batch(request.texts)
    
    embeddings = []
    for result in results:
        vector = result.vector
        
        # Truncate if requested
        if request.dimensions and request.dimensions < len(vector):
            vector = vector[:request.dimensions]
            vector = vector / np.linalg.norm(vector)
        
        embeddings.append(vector.tolist())
    
    return EmbedResponse(
        embeddings=embeddings,
        model=request.model,
        dimensions=len(embeddings[0]) if embeddings else 0,
        usage={"total_tokens": sum(len(t.split()) for t in request.texts)}
    )

@app.post("/v1/similarity")
async def compute_similarity(request: SimilarityRequest) -> dict:
    """Compute similarity between two texts."""
    
    if request.model not in models:
        raise HTTPException(status_code=400, detail=f"Unknown model: {request.model}")
    
    model = models[request.model]
    
    results = model.embed_batch([request.text1, request.text2])
    
    emb1 = results[0].vector
    emb2 = results[1].vector
    
    # Cosine similarity
    similarity = float(np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2)))
    
    return {
        "similarity": similarity,
        "model": request.model
    }

@app.get("/v1/models")
async def list_models() -> dict:
    """List available models."""
    
    return {
        "models": [
            {
                "id": name,
                "dimensions": model.dimensions
            }
            for name, model in models.items()
        ]
    }

@app.get("/health")
async def health():
    return {"status": "healthy"}

References

Conclusion

Choosing the right embedding model depends on your specific use case, latency requirements, and budget. For general-purpose applications, OpenAI’s text-embedding-3-small offers excellent quality with reasonable cost and no infrastructure overhead. For on-premise deployment or cost-sensitive applications, sentence-transformers models like all-MiniLM-L6-v2 provide good quality with fast inference. For retrieval tasks, instruction-tuned models like E5 or BGE often outperform general embeddings because they’re trained specifically for asymmetric search. Always evaluate on your own data—benchmark scores don’t always translate to your domain. Consider dimension reduction for storage efficiency; Matryoshka embeddings or PCA can reduce dimensions significantly with minimal quality loss. Cache embeddings aggressively; recomputing the same text is wasteful. For fine-tuning, contrastive learning with hard negatives is most effective; mine negatives from your actual corpus for best results. In production, batch requests for throughput, implement proper caching, and monitor embedding quality over time as your data distribution may shift. The key insight is that embeddings are not one-size-fits-all—the best model for semantic search may not be the best for clustering, and domain-specific fine-tuning often provides significant improvements over off-the-shelf models.