WebAssembly

  • Training GitHub Repository Embeddings with Stars


    [P] Training GitHub Repository Embeddings using StarsGitHub Stars, often used as bookmarks, provide valuable insights into which repositories are semantically similar. By processing approximately 1TB of raw data from GitHub Archive, an interest matrix for 4 million developers was created, leading to the training of embeddings for over 300,000 repositories using Metric Learning techniques. A client-only demo was developed that conducts vector searches directly in the browser via WebAssembly, eliminating the need for a backend. This system not only identifies non-obvious library alternatives but also facilitates semantic comparisons of developer profiles, offering a powerful tool for developers to explore and utilize GitHub repositories more effectively. This matters because it enhances the ability to discover and compare software projects and developer interests, potentially leading to more innovative and collaborative projects.

    Read Full Article: Training GitHub Repository Embeddings with Stars

  • EdgeVec v0.7.0: Browser-Based Vector Search


    EdgeVec v0.7.0: Run Vector Search in Your Browser — 32x Memory Reduction + SIMD AccelerationEdgeVec v0.7.0 is a browser-based vector database designed to provide local AI applications with cloud-like vector search capabilities without network dependency. It introduces significant updates such as binary quantization for a 32x memory reduction, SIMD acceleration for up to 8.75x faster processing, and IndexedDB persistence for data retention across sessions. These features enable efficient local document search, offline retrieval-augmented generation (RAG), and privacy-preserving AI assistants by allowing data to remain entirely on the user's device. This matters because it empowers users to perform advanced searches and AI tasks locally, maintaining privacy and reducing reliance on cloud services.

    Read Full Article: EdgeVec v0.7.0: Browser-Based Vector Search

  • EdgeVec v0.7.0: Fast Browser-Native Vector Database


    [P] EdgeVec v0.7.0: Browser-Native Vector Database with 8.75x Faster Hamming Distance via SIMDEdgeVec is an open-source vector database designed to run entirely in the browser using WebAssembly, offering significant performance improvements in its latest version, v0.7.0. The update includes an 8.75x speedup in Hamming distance calculations through SIMD optimizations, a 32x memory reduction via binary quantization, and a 3.2x acceleration in Euclidean distance computations. EdgeVec enables browser-based applications to perform semantic searches and retrieval-augmented generation without server dependencies, ensuring privacy, reducing latency, and eliminating hosting costs. These advancements make it feasible to handle large vector indices in-browser, supporting offline-first AI tools and enhancing user experience in web applications. Why this matters: EdgeVec's advancements in browser-native vector databases enhance privacy, reduce latency, and lower costs, making sophisticated AI applications more accessible and efficient for developers and users alike.

    Read Full Article: EdgeVec v0.7.0: Fast Browser-Native Vector Database