[{"content":"The Core Idea Attention lets each token in a sequence \u0026ldquo;look at\u0026rdquo; every other token and decide how much to weight its contribution.\nQuery, Key, Value Three projections of the input:\nQuery (Q): what this token is looking for Key (K): what each token offers Value (V): what each token contributes if selected $$\\text{Attention}(Q, K, V) = \\text{softmax}!\\left(\\frac{QK^T}{\\sqrt{d_k}}\\right)V$$\nWhy It Works The dot product of Q and K measures compatibility. Dividing by $\\sqrt{d_k}$ prevents softmax from saturating in high dimensions.\nMulti-Head Attention Running attention in parallel across multiple learned subspaces lets the model attend to different relationship types simultaneously.\nKey Takeaways Self-attention is permutation-equivariant — position encodings add order. Complexity is O(n²) in sequence length. The transformer replaced recurrence with attention entirely. ","permalink":"https://learning-notes-8ef.pages.dev/posts/machine-learning/understanding-attention/","summary":"\u003ch2 id=\"the-core-idea\"\u003eThe Core Idea\u003c/h2\u003e\n\u003cp\u003eAttention lets each token in a sequence \u0026ldquo;look at\u0026rdquo; every other token and decide how much to weight its contribution.\u003c/p\u003e\n\u003ch2 id=\"query-key-value\"\u003eQuery, Key, Value\u003c/h2\u003e\n\u003cp\u003eThree projections of the input:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eQuery (Q):\u003c/strong\u003e what this token is looking for\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eKey (K):\u003c/strong\u003e what each token offers\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eValue (V):\u003c/strong\u003e what each token contributes if selected\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e$$\\text{Attention}(Q, K, V) = \\text{softmax}!\\left(\\frac{QK^T}{\\sqrt{d_k}}\\right)V$$\u003c/p\u003e\n\u003ch2 id=\"why-it-works\"\u003eWhy It Works\u003c/h2\u003e\n\u003cp\u003eThe dot product of Q and K measures compatibility. Dividing by $\\sqrt{d_k}$ prevents softmax from saturating in high dimensions.\u003c/p\u003e","title":"Understanding Attention Mechanisms in Transformers"},{"content":"The Problem Rust Solves Memory bugs — use-after-free, double-free, dangling pointers — are the root of most systems security vulnerabilities. Rust eliminates them at compile time.\nThe Three Rules Every value has exactly one owner. When the owner goes out of scope, the value is dropped. Ownership can be transferred (moved) or temporarily lent (borrowed). What Finally Clicked I stopped thinking about the borrow checker as a restriction and started thinking about it as a guarantee: there is always exactly one place responsible for cleanup.\nBorrowing in Practice fn print_len(s: \u0026amp;String) { // borrows, does not own println!(\u0026#34;{}\u0026#34;, s.len()); } fn main() { let s = String::from(\u0026#34;hello\u0026#34;); print_len(\u0026amp;s); // s still valid after this println!(\u0026#34;{}\u0026#34;, s); // works fine } Key Takeaways Move semantics are the default; clone() is explicit. \u0026amp;T is a shared borrow (read-only); \u0026amp;mut T is exclusive. The borrow checker enforces all of this at compile time, not runtime. ","permalink":"https://learning-notes-8ef.pages.dev/posts/programming/rust-ownership/","summary":"\u003ch2 id=\"the-problem-rust-solves\"\u003eThe Problem Rust Solves\u003c/h2\u003e\n\u003cp\u003eMemory bugs — use-after-free, double-free, dangling pointers — are the root of most systems security vulnerabilities. Rust eliminates them at compile time.\u003c/p\u003e\n\u003ch2 id=\"the-three-rules\"\u003eThe Three Rules\u003c/h2\u003e\n\u003col\u003e\n\u003cli\u003eEvery value has exactly one owner.\u003c/li\u003e\n\u003cli\u003eWhen the owner goes out of scope, the value is dropped.\u003c/li\u003e\n\u003cli\u003eOwnership can be transferred (moved) or temporarily lent (borrowed).\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch2 id=\"what-finally-clicked\"\u003eWhat Finally Clicked\u003c/h2\u003e\n\u003cp\u003eI stopped thinking about the borrow checker as a restriction and started thinking about it as a guarantee: \u003cstrong\u003ethere is always exactly one place responsible for cleanup\u003c/strong\u003e.\u003c/p\u003e","title":"Rust Ownership: Mental Models That Finally Clicked"},{"content":"The Core Argument The average human life is about four thousand weeks. You will never clear your to-do list. Accepting this — really accepting it — changes how you make decisions.\nWhat Struck Me The book\u0026rsquo;s central move is to reframe the productivity trap: the reason \u0026ldquo;getting on top of things\u0026rdquo; never works is that it\u0026rsquo;s based on a false premise — that a state of being on top of things is achievable.\nUseful Takeaways Choose what to fail at. Every yes is a no to everything else. Better to choose consciously than to let urgency choose for you.\nSettle. The fear of commitment is partly fear of cutting off alternatives. But an unchosen life — kept permanently open — is its own trap.\nResist instrumentalizing leisure. Rest that exists to make you more productive isn\u0026rsquo;t rest.\nVerdict Not a productivity book — an argument against the productivity mindset. Worth reading slowly.\n","permalink":"https://learning-notes-8ef.pages.dev/posts/books/four-thousand-weeks/","summary":"\u003ch2 id=\"the-core-argument\"\u003eThe Core Argument\u003c/h2\u003e\n\u003cp\u003eThe average human life is about four thousand weeks. You will never clear your to-do list. Accepting this — really accepting it — changes how you make decisions.\u003c/p\u003e\n\u003ch2 id=\"what-struck-me\"\u003eWhat Struck Me\u003c/h2\u003e\n\u003cp\u003eThe book\u0026rsquo;s central move is to reframe the productivity trap: the reason \u0026ldquo;getting on top of things\u0026rdquo; never works is that it\u0026rsquo;s based on a false premise — that a state of being on top of things is achievable.\u003c/p\u003e","title":"Notes: Four Thousand Weeks — Oliver Burkeman"},{"content":"What is Gradient Descent? Gradient descent minimizes a function by iteratively stepping in the direction of steepest descent.\nThe Intuition Imagine standing blindfolded on a hilly landscape. You want the lowest point. Strategy: feel the slope, step downhill, repeat.\nThe Math Given a loss function $L(\\theta)$, update parameters as:\n$$\\theta = \\theta - \\alpha \\nabla L(\\theta)$$\nwhere $\\alpha$ is the learning rate.\nA Simple Implementation def gradient_descent(grad_fn, theta, lr=0.01, steps=100): for _ in range(steps): theta -= lr * grad_fn(theta) return theta Key Takeaways Learning rate too large: diverges. Too small: slow. Vanilla GD uses the full dataset per step. Mini-batch GD is the practical default in deep learning. ","permalink":"https://learning-notes-8ef.pages.dev/posts/machine-learning/gradient-descent/","summary":"\u003ch2 id=\"what-is-gradient-descent\"\u003eWhat is Gradient Descent?\u003c/h2\u003e\n\u003cp\u003eGradient descent minimizes a function by iteratively stepping in the direction of steepest descent.\u003c/p\u003e\n\u003ch2 id=\"the-intuition\"\u003eThe Intuition\u003c/h2\u003e\n\u003cp\u003eImagine standing blindfolded on a hilly landscape. You want the lowest point. Strategy: feel the slope, step downhill, repeat.\u003c/p\u003e\n\u003ch2 id=\"the-math\"\u003eThe Math\u003c/h2\u003e\n\u003cp\u003eGiven a loss function $L(\\theta)$, update parameters as:\u003c/p\u003e\n\u003cp\u003e$$\\theta = \\theta - \\alpha \\nabla L(\\theta)$$\u003c/p\u003e\n\u003cp\u003ewhere $\\alpha$ is the learning rate.\u003c/p\u003e\n\u003ch2 id=\"a-simple-implementation\"\u003eA Simple Implementation\u003c/h2\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cpre tabindex=\"0\" style=\"color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;\"\u003e\u003ccode class=\"language-python\" data-lang=\"python\"\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e\u003cspan style=\"color:#66d9ef\"\u003edef\u003c/span\u003e \u003cspan style=\"color:#a6e22e\"\u003egradient_descent\u003c/span\u003e(grad_fn, theta, lr\u003cspan style=\"color:#f92672\"\u003e=\u003c/span\u003e\u003cspan style=\"color:#ae81ff\"\u003e0.01\u003c/span\u003e, steps\u003cspan style=\"color:#f92672\"\u003e=\u003c/span\u003e\u003cspan style=\"color:#ae81ff\"\u003e100\u003c/span\u003e):\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003efor\u003c/span\u003e _ \u003cspan style=\"color:#f92672\"\u003ein\u003c/span\u003e range(steps):\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e        theta \u003cspan style=\"color:#f92672\"\u003e-=\u003c/span\u003e lr \u003cspan style=\"color:#f92672\"\u003e*\u003c/span\u003e grad_fn(theta)\n\u003c/span\u003e\u003c/span\u003e\u003cspan style=\"display:flex;\"\u003e\u003cspan\u003e    \u003cspan style=\"color:#66d9ef\"\u003ereturn\u003c/span\u003e theta\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003ch2 id=\"key-takeaways\"\u003eKey Takeaways\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eLearning rate too large: diverges. Too small: slow.\u003c/li\u003e\n\u003cli\u003eVanilla GD uses the full dataset per step.\u003c/li\u003e\n\u003cli\u003eMini-batch GD is the practical default in deep learning.\u003c/li\u003e\n\u003c/ul\u003e","title":"Gradient Descent: From Intuition to Implementation"},{"content":"Hi, I\u0026rsquo;m Hieu — a CS student working as AI/ML Engineer.\nI spend my days training models that occasionally work, reading papers that occasionally make sense, and doing problems that occasionally don\u0026rsquo;t make me want to close my laptop forever. The keyword here is occasionally.\nThis journal exists because my brain leaks. I read something, think \u0026ldquo;ah, that\u0026rsquo;s clever,\u0026rdquo; and then forget it completely within 48 hours. So now I write it down. If it happens to be useful to you too, great — we can be confused together.\nTopics I\u0026rsquo;ll probably write about: AI/ML concepts, algorithms, books, and whatever rabbit hole I\u0026rsquo;ve fallen into this week.\nLinkedIn — I promise my profile is more professional than this page.\nGitHub\n","permalink":"https://learning-notes-8ef.pages.dev/about/","summary":"\u003cp\u003eHi, I\u0026rsquo;m Hieu — a CS student working as AI/ML Engineer.\u003c/p\u003e\n\u003cp\u003eI spend my days training models that occasionally work, reading papers that occasionally make sense, and doing problems that occasionally don\u0026rsquo;t make me want to close my laptop forever. The keyword here is \u003cem\u003eoccasionally\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003eThis journal exists because my brain leaks. I read something, think \u0026ldquo;ah, that\u0026rsquo;s clever,\u0026rdquo; and then forget it completely within 48 hours. So now I write it down. If it happens to be useful to you too, great — we can be confused together.\u003c/p\u003e","title":"About"}]