NEWTrain a custom GPT Chatbot on YouTube videosTry Now

[AINews] Not much happened today • ButtondownTwitterTwitter

buttondown.email

Updated on June 4 2024

Chapters

AI Twitter Recap
AI Discord Recap
Discord Server Highlights
Various AI Discord Community Updates
LLM Finetuning Discussions
CUDA Mode Discussions
Customizing Dtype String-Conversion, Binary/Trinary Matrix Multiplication Paper, Debugging PyTorch Errors, FakeTensor Issue Resolution
Discussions on Perplexity AI and ChatGPT
HuggingFace Discussions
Situations Discussed in LM Studio Channels
Discussion on Architectural Projects, Wildcards Plugin Issues, Celebrity AI Models
Mojo and Modular Discussions
Interconnects (Nathan Lambert) Discussion
OpenAccess AI Collective (axolotl)

AI Twitter Recap

Claude 3 Opus provided recaps on AI and Large Language Model Developments. Here are some highlights:

Gemini model performance:

@arohan highlighted the Gemini 1.5 FlashModel for high performance at low cost, making useful models accessible.

Optimizing Mixtral models with TensorRT:

@rohanpaul_ai shared how Mixtral models can run faster on NVIDIA RTX GPUs using TensorRT-LLM.

Mamba-2 model architecture:

@tri_dao and @_albertgu introduced Mamba-2, enabling sequence models with larger states, faster training, and connections between SSMs and linear attention.

Phi-3 model benchmarks:

@_philschmid reported on Phi-3 Medium and Small models on the @lmsysorg leaderboard.

Prompt Engineering and Data Curation:

@rohanpaul_ai emphasized the power of prompting LLMs correctly and highlighted the importance of data quality by @sarahcat21.

AI Discord Recap

Unsloth AI (Daniel Han) Discord

VRAM Vanquished by Token Increase: Extending llama-3-8b to 64k tokens caused an OutOfMemoryError on an H100 with 80GB VRAM; discussions aimed to resolve this through gradient checkpointing and tuning configurations.
Speedy Sustained LLM Pretraining: Unsloth AI’s new update allows for doubling the speed and halving the VRAM usage compared to Hugging Face + Flash Attention 2 QLoRA during continuous pretraining of LLMs, as discussed in their recent blog.
Questions on Multi-GPU and 8-bit Optimization: The community actively engaged in conversations about multi-GPU support and testing Unsloth AI’s performance on different GPU configurations, while addressing the current limitations of fine-tuning with 8-bit quantization on models like phi-3-medium-4k.
Unsloth Setup and Optimization Tactics: Instructions and troubleshooting tips for local Unsloth setup were shared, including the use of Jupyter Notebook and Docker, with links to GitHub readme and Jiar/jupyter4unsloth. The community also covered LoRA rank calculation, referencing insights from Lightning AI.
Community Cordiality Continues: New members were warmly welcomed into the community, fostering a supportive environment for collaboration and knowledge exchange.

Discord Server Highlights

This section covers highlights from various Discord servers related to AI, deep learning, and model optimization. The discussions range from performance puzzles in large language models to model recommendations for specific tasks. Engineers share insights on GPU usage, model optimization, AI limitations, and custom server builds. Additionally, the community delves into research advancements, debates on AI models' performance, and challenges in automated tools' implementation. The section also includes updates on AI funding rounds, new model releases, and community engagements within the AI Discord servers.

Various AI Discord Community Updates

This section provides updates from various AI Discord communities, including discussions on conferences, troubleshooting technical issues, sharing resources, and community interactions. Highlights include engineers reflecting on conference experiences, users seeking help with Python scripts and AI models, discussions on AI in medical diagnosis, navigating Discord channels for collaborations and announcements, resolving technical issues related to CUDA, and sharing insights on AI safety and regulations. New releases, job postings, and guides related to fine-tuning LLMs are also featured in this section.

LLM Finetuning Discussions

LLM Finetuning (Hamel + Dan) Discussions

Replicate Credits Redemption: Members are reminded to check their emails for a redemption email from Replicate to claim credits.
Creating Replicate Orgs: Clarification that creating orgs on Replicate requires GitHub orgs.
Credit Visibility Concerns: Credits should be visible without billing setup but confirmation may be needed.
Mixed Success in Claiming Credits: Some users successfully claimed Replicate credits while others reported not receiving them yet.
Multimodal Finetuning Query: Inquiry if Axolotl supports multimodal finetuning.
CPU-RAM Efficient Loading: Explanation of sharded model pieces utilization with cpu_ram_efficient_loading setting.
Charles Modal Flex: Charles from Modal offers additional credits for trying Modal.
Model Serving Insights: Enthusiasm about merging LoRA with base models using Axolotl.
Medusa and LLM Innovations: Introduction of Medusa method to augment LLM inference.
Quantization Challenges: Discussion on quantized inference challenges.
RAG Experiments: Focus on experimenting with model training for educational purposes.
PaliGemma for OCR Tasks: PaliGemma's OCR capabilities and more.
Finetuning vs. Retrieval for Government Data: Considerations on finetuning LLMs with Q&A pairs for government data.
Experiments with Opus 4o and MiniCPM-Llama3-V-2_5: Exploring finetuning these models for better performance.
Link to Countryside Stewardship Grant Finder: Access information on grant options, capital items, and supplements. Link

CUDA Mode Discussions

CUDA Compiler Controversy: Users discussed compiling errors related to CUDA code and the resolution involved in converting from .cpp to .cu files.
Linker Hiccups: Members worked on resolving linker errors by making functions and variables inline and debated the nuances of inline, static, and extern declarations.
Grad Mirroring Mishap: A bug causing gradient divergence in single and multi-GPU runs was traced and resolved by zeroing gradients after PyTorch to C bridging.
Refactoring Frenzy: Code was refactored to isolate CUDA kernels into separate files, improving organization and performance.
Loss Calculation Head-scratchers: Inconsistencies in loss calculations were discovered between setups, highlighting the complexities of DDP in PyTorch.

Customizing Dtype String-Conversion, Binary/Trinary Matrix Multiplication Paper, Debugging PyTorch Errors, FakeTensor Issue Resolution

Customizing dtype string-conversion: Members discussed the challenge of `TrinaryTensor` auto-conversion from unit2 to uint8 when printed. Suggested solutions included overriding the `repr` method to correctly display values as -1, 0, 1 instead of 0, 1, 2.### Interesting paper on binary/trinary matrix multiplication: A member shared an interesting paper on binary/trinary matrix multiplication. Additional resources provided included links to Cutlass BMMA and NVIDIA’s CUDA C Programming Guide.### Debugging PyTorch's 'cache_size_limit reached' issue: Various strategies were discussed for debugging the `torch._dynamo.exc.Unsupported: cache_size_limit reached` error. Solutions involved marking parameters as dynamic, checking for graph breaks, and potentially increasing cache size to avoid recompiles.### PyTorch FakeTensor issue resolution: A proposed fix for the FakeTensor issue was shared with a link to a GitHub pull request. This aims to address problems with tensor metadata dispatching in functional tensors.

Discussions on Perplexity AI and ChatGPT

Users on the platform engaged in discussions regarding the limitations and frustrations related to the Opus 600 limit versus the lack of communication from Perplexity regarding potential adjustments. There was a comparison between Perplexity AI Premium and ChatGPT, highlighting Perplexity's superior web search capabilities and model selection. Members also shared advice on using AI tools for school presentations, along with useful links and videos for understanding technical AI concepts.

HuggingFace Discussions

HuggingFace had several discussions in different channels such as 'cool-finds' and 'computer-vision'.
Members shared various topics including a study on preference alignment in language models in 'cool-finds'.
In 'NLP', users talked about debugging function import issues.
'diffusion-discussions' included conversations about training ResNet-50 with 600 images and optimizing SDXL inference with JIT trace.
OpenAI discussions covered topics like real AGI concepts in 'ai-discussions' and issues with ChatGPT in 'gpt-4-discussions'.
'prompt-engineering' in OpenAI discussed the challenges and alternatives in using prompts efficiently.
Lastly, in 'LM Studio ▷ 💬-general', discussions on running models on CPU or GPU and handling model loading errors were highlighted.

Situations Discussed in LM Studio Channels

The LM Studio channels covered various topics related to server processes, challenges with model token limits, downloading and managing models, and discussions around specific model recommendations. Members also shared insights on inference speed testing, GPU performance, hardware choices, and alternative Linux setups. The discussions delved into troubleshooting, driver considerations, trust in second-hand GPUs, and quantization explanations. Additionally, different AI-related channels touched on topics like memory bandwidth limitations, Chinese language support inquiries, upcoming LM Studio releases, AVX extension pack testers, and code generation modules. Noteworthy mentions included music videos, design work admiration, and upcoming projects like Nous World in 2030. Various links shared solutions, engagements, training techniques, and evaluation insights in the AI and world simulation domains.

Discussion on Architectural Projects, Wildcards Plugin Issues, Celebrity AI Models

Architectural Projects: Users discuss using Stable Diffusion for architectural projects, focusing on interior previews based on drawings. While one user finds limitations with straight lines and mechanics, img2img with detailed drawings is suggested for better results.
Wildcards Plugin Issues: A user reports degraded image quality after installing the wildcards plugin on Stable Diffusion, with extremely grainy images and color blotches persisting despite re-installs.
Community Models Recommendation: Members recommend community models from sites like civitai.com to enhance Stable Diffusion rendering quality. ChaiNNer is also suggested as an upscaler tool for batch upscaling images.
Celebrities as AI Models: The rise of influencer and celebrity LoRas on platforms like Civit is discussed, noting an increase in AI-generated profiles and prompting the question of when a celebrity is not a celebrity.

Mojo and Modular Discussions

OpenRouter (Alex Atallah)

Discussions included issues with ETH payments, prefill handling by LLMs, GPT-3.5 Turbo problems, Mistral model reliability, and best models for storytelling.
Recommendations were made for using specific provider routing and LLM rankings for roleplay prompts.

Modular (Mojo)

Topics covered speeding up Python with Numba, Python generators, Mojo and Python execution, and community meetings.
Performance and benchmark optimizations were discussed.

LangChain AI

Various discussions on LangChain AI included topics such as Ollama models, customer support assistants, chat context in SQL agents, text categorization with embeddings, and persisting chatbot memory.
Tools like automated chat analyzers, CrewAI news projects, and dynamic tool calling with LangChain were introduced.

For more details, refer to the individual sections for each platform.

Interconnects (Nathan Lambert) Discussion

Scaling Law Dismissal Meme: A user humorously portrayed evolution conversations about pattern-seeking brains and predators, ending with a jibe at scaling laws in 2030+. Call for Massive Parameter Scaling: Users hilariously pleaded for scaling AI models to 10 trillion parameters. Frustration with AGI Debates: Users expressed irritation over polarized views in AGI discussions, criticizing lack of epistemic uncertainty and the belief in scaling trends without a 'pinch point.'

OpenAccess AI Collective (axolotl)

Fun at conferences even without results: Attendees express enjoyment of conferences despite not having accepted papers, highlighting the value of participation itself.
Struggling with custom OrPO formatter: A member seeks help loading a custom OrPO formatter Python script for tokenizing pre-converted datasets. Related script link.
Critiquing AI in medical VQA: A tweet shared by a member criticizes state-of-the-art models like GPT-4V and Gemini Pro for performing worse than random in medical VQA tasks, introducing the ProbMed dataset to evaluate performance. Discussion arises on the inadequacy of vision LLMs for medical image diagnosis.
Seeking arXiv endorsement: One member asks for endorsement on arXiv for the cs.LG category but later resolves the issue by using their organizational email.

FAQ

Q: What are some highlights from the AI Twitter Recap provided by Claude 3 Opus?

A: Highlights include discussions on Gemini model performance, Mixtral models optimization with TensorRT, Mamba-2 model architecture, Phi-3 model benchmarks, and more.

Q: What were some of the topics discussed in the Unsloth AI Discord community?

A: Topics included VRAM issues, LLM pretraining speed and efficiency, multi-GPU support, Unsloth setup optimization, and community interactions.

Q: What were some of the key discussions in the LLM Finetuning discussions by Hamel and Dan?

A: Discussions covered credits redemption, creating Replicate orgs, multimodal finetuning, model serving insights, quantization challenges, and experiments with various models.

Q: What were some of the challenges and solutions discussed in the Customizing dtype string-conversion section?

A: Members discussed challenges with TrinaryTensor auto-conversion, shared solutions like overriding __repr__ method, and explored interesting papers on matrix multiplication.

Q: What were some of the interesting topics discussed in the LM Studio channels?

A: Topics included server processes, model token limits, GPU performance, alternative Linux setups, troubleshooting, memory bandwidth limitations, and upcoming LM Studio releases.

Q: What were some of the notable discussions in the OpenRouter discussions led by Alex Atallah?

A: Discussions included ETH payments, LLM handling, Mistral model reliability, storytelling models, and recommendations for LLM rankings in roleplay prompts.

Q: What topics were covered in the Modular discussions led by Mojo?

A: Topics included speeding up Python with Numba, Python generators, community meetings, and performance benchmark optimizations.

Q: What were some of the interesting discussions in the LangChain AI Discord channel?

A: Discussions involved Ollama models, chat support assistants, text categorization, chatbot memory, automated chat analyzers, and dynamic tool calling.

Q: What were some of the humorous or light-hearted moments mentioned in the Scaling Law Dismissal Meme section?

A: Users humorously portrayed evolution conversations, jokingly requested 10 trillion parameter AI models, and expressed frustration with AGI debates.

Q: What were some of the diverse topics discussed in conferences across different channels like HuggingFace, OpenAI, and LM Studio?

A: Topics ranged from model preferences in 'cool-finds' to debugging issues in 'NLP' and 'diffusion-discussions', along with music videos, design admiration, and project previews.

Get your own AI Agent Today

Thousands of businesses worldwide are using Chaindesk Generative AI platform.
Don't get left behind - start building your own custom AI chatbot now!

Start For Free

Book a Demo