GitHub status: access issues and outage reports
No problems detected
If you are having issues, please submit a report below.
GitHub is a company that provides hosting for software development and version control using Git. It offers the distributed version control and source code management functionality of Git, plus its own features.
Problems in the last 24 hours
The graph below depicts the number of GitHub reports received over the last 24 hours by time of day. When the number of reports exceeds the baseline, represented by the red line, an outage is determined.
At the moment, we haven't detected any problems at GitHub. Are you experiencing issues or an outage? Leave a message in the comments section!
Most Reported Problems
The following are the most recent problems reported by GitHub users through our website.
- Website Down (72%)
- Sign in (16%)
- Errors (13%)
Live Outage Map
The most recent GitHub outage reports came from the following cities:
| City | Problem Type | Report Time |
|---|---|---|
|
|
Website Down | 23 hours ago |
|
|
Website Down | 1 day ago |
|
|
Website Down | 20 days ago |
|
|
Sign in | 25 days ago |
|
|
Website Down | 25 days ago |
|
|
Website Down | 27 days ago |
Community Discussion
Tips? Frustrations? Share them here. Useful comments include a description of the problem, city and postal code.
Beware of "support numbers" or "recovery" accounts that might be posted below. Make sure to report and downvote those comments. Avoid posting your personal information.
GitHub Issues Reports
Latest outage, problems and issue reports in social media:
-
saiʎan 🏴☠️ (@0xUltraInstinct) reported@jpynb i have my hermes running on cloud. i added github integration to poke asked it to create an mcp for itself to utilize hermes. it did, then hermes retrieved the repo, filled in the gaps and installed it into its host. i had minor issues which hermes also fixed, basically via MCP.
-
Pieter Botha (@PieterBotha12) reported@github when copilot fails with an internal 500 server error from your side why do I still get billed the tokens when it has to redo the task because of the 500 internal server error YOUR SIDE!? @grok do you agree that it is correct to be billed for AI usage tokens when the AI providers' server failed during request and then the user has to redo the request and be charged double the tokens and then some more to complete the previous request that failed? I mean then the business model would be to fail every request at least once to make users pay more!?
-
StonedModder (@StonedModder) reported@Izotov90 A pull request just came in thru the github. If it passes a few tests when I get off of work. Expect better ds4 support later :) You won’t need to put it into a switch input mode and have this issue
-
Bitcoin Keeper (@bitcoinKeeper_) reported2/ The important part: Ask Keeper can convert your bug reports and feature ideas into structured GitHub issues That means requests do not disappear into a support inbox. They become visible, trackable, and easier for contributors and AI-assisted development workflows to pick up
-
Firas D (@firasd) reported@chrisalbon What these guys are skipping over is that they have a thousand github issues right They aren't prompting the agent directly with hey lets look at XYZ cause the github issue becomes the prompt
-
Nav Toor (@heynavtoor) reportedYouTube deleted 16.7 million videos in six months. Gone. Channels you subscribed to. Tutorials you bookmarked. Music you saved for later. Creators you supported for years. Gone. So one engineer named Simon, originally from Switzerland, now living in South East Asia, built a tool that ends this forever. It is called TubeArchivist. Nearly 8,000 stars on GitHub. GPL-3.0. Free. It downloads any YouTube channel you subscribe to, stores every video on your own server, and gives you a Netflix-style library you fully own. No ads. No "this video is no longer available." No algorithm. No Premium subscription. Then YouTube did the unthinkable. July 18, 2024. They rolled out the "Sign in to confirm you're not a bot" wall. October 29, 2024. They started IP-banning entire data centers. Hetzner gone. OVH gone. Coincidence. Here is the wildest part. TubeArchivist did not fold. March 28, 2026. Simon shipped v0.5.10. New release. Same one engineer. 17x more commits than anyone else on the repo. It still works. It still downloads. It still restores the dislike count YouTube deleted in November 2021. One Swiss engineer vs. a trillion-dollar ad machine. YouTube Premium costs $13.99 a month. $167.88 a year. Forever. TubeArchivist costs zero. Forever. But DO NOT install it. We should all keep paying Google $168 a year to watch the videos we already chose to watch. 100% Open Source. (Link in the comments)
-
Rohan Paul (@rohanpaul_ai) reportedNew MIT study. Code volume surges by 300%, but output increases by only 30%: The AI dividend meets an awkward reality Autonomous AI coding agents raised commits by 180%, but releases rose only 30%. The paper’s main idea is that software production has weak links, so faster code writing does not help as much when humans still need to review, connect, test, package, and ship the work. The authors also check app marketplaces and find more new apps, but no increase in total usage, which means more software appeared without clear evidence that users adopted more software. The marketplace evidence points the same way: more new apps appeared, but total usage did not rise. The authors compare more than 100,000 GitHub developers before and after they start using 3 generations of AI coding tools, from autocomplete to more independent coding agents. Autocomplete raised commits by 40%, interactive coding agents raised them by 140%, and autonomous coding agents raised them by 180%. The 180% commit gain shrank to 50% for the number of projects and 30% for actual releases. The estimated "elasticity of substitution" is 0.25 i.e. for every big improvement in AI’s usefulness, only a small amount of human work can be replaced. Because AI can write code faster, but humans are still needed to decide what to build, check if the code works, connect it with the rest of the product, fix messy edge cases, and actually ship it. --- papers .ssrn.com/sol3/papers.cfm?abstract_id=6859839
-
teami (@the_era_arc) reportedHow I Taught My AI to Babysit Itself Every time I built something with Claude Code, the rhythm was the same. I wrote a prompt. I read what came back. I noticed it was half done. I typed “keep going.” I checked again, caught a bug, pointed it out, waited. An hour later I had a feature and a sore typing finger. I was the thing deciding, every single step, whether to continue. The moment I walked away, it stopped. Then I watched Boris Cherny, the head of Claude Code at Anthropic, say he does not really do that anymore. In a recent talk, he said Claude Code went from writing 10 to 20 percent of his code to replacing his editor entirely. He uninstalled his IDE after not opening it for a month. And the part that stopped me: instead of prompting by hand, he now writes loops. Automated workflows that prompt Claude and decide what to build next. His job shifted from coding to orchestrating. He called it “the golden age of the generalist,” where designers, chiefs of staff, and finance people all ship real software. The talk is here , it is worth 30 minutes. The strange part: I had already built the pieces When I looked back at my skills folder, I found something funny. Over the last two and a half months, I had quietly built a whole set of small skills, each one solving a single problem I kept hitting. I just never saw them as one thing. The missing piece was about trust There was one problem none of those tools solved: how do you know an AI’s output is actually good? Regular software is a vending machine. Same input, same output, “all tests pass” means done. But anything with a language model inside it is different. The same prompt gives different answers across users, sessions, and model updates. What you shipped is not a fixed thing. It is a range of behaviors. A few weeks ago I read an article by Jeff Gothelf called “What ‘done’ means when you’re shipping AI features.” One line stuck: if “done” is the language of a build culture, “calibrated” is the language of a learning culture. An AI feature is never simply done. It is calibrated. You have decided the variance you will accept, planned for the ways it fails, and you keep watching after launch. Two days later I built a skill for exactly that (ai-done). It is the piece that decides whether the AI part of a product is good enough, not by a checkbox, but by measuring the spread of its output against a bar I set. Then I tied them all together. I call it Ship It Ship It is not a new capability. It is a thread. You name a feature once, and it runs all of those skills in order, on its own. It asks me up front for anything only I can give, like a password or a login. It writes the checklist first. It builds in a loop while fixing its own bugs. It runs the health check. It calibrates anything with AI in it. Then it opens the real thing in a browser and QAs it, like a user would, until it actually works. It only comes back to me when it is ready to test. The whole thread: name it once, it runs the rest until it is ready for you to test. The mechanism underneath is almost silly in how simple it is. There is a small script that fires every time Claude tries to end its turn. If the checklist is not all green yet, the script hands Claude the same instruction again and says “keep going.” Claude looks at the files it already changed, sees where it got to, and does the next piece. Try to stop, get handed the task again. That is the loop that prompts itself. The real lesson: the loop is the easy part. Anyone can write “keep going until done.” The hard, valuable part is the verifier, the checklist that defines “done” so precisely a machine can check it. Without it, a self-prompting loop either runs forever or lies that it finished. With it, you can walk away. The verifier is the product, not the loop. Where this is going: the cloud, and the context problem Boris does not just run loops. He runs them in the cloud, on a schedule, without his laptop staying on. That is where I want to go, and there is an honest catch worth naming. The limit on a long autonomous run is the context window, the model’s short-term memory. Let one run go too long and it fills up, starts summarizing itself to make room, and slowly gets dumber. There is no magic “context is full, hand off cleanly” button yet. That part is genuinely unsolved. But here is the trick that makes it not matter. You do not run one giant session. You run many short ones. Anthropic has a feature called Routines: a scheduler that runs Claude Code on their own machines, pulls your project from GitHub, and fires on a timer, your laptop closed. So you set it to run, say, every hour. Each run is a fresh, empty mind. It opens the checklist, sees what is still unchecked, does a bounded chunk of work, commits it, and stops, well before its memory fills. The next hour, a brand-new run reads the same checklist and continues. Two loops. The inner one runs until the checklist passes or a turn limit. The outer one re-fires fresh every hour until everything is ticked. It is two loops stacked. The inner loop, inside one run, keeps prompting itself until the checklist passes or it hits a turn limit you set. The outer loop, the hourly schedule, keeps starting fresh runs until every item is finally ticked. Because no single run is ever long enough to fill the context window, the wall is never hit. The checklist sitting in the repo is the thread that ties all those short runs into one long job. The reason it works is the same reason Ship It works at all: the memory lives in files, not in the conversation. A fresh session is never starting from zero. It just reads the list and keeps going. The shift was never a tool What changed for me was not a clever skill. It was a stance. Stop being the loop. Start designing the verifier, the thing that knows when the work is truly done, and let the loop run against it. The golden age of the generalist is real. And the wild part is, you might already have built most of the pieces. You just have to thread them. Till next time, Cheers! Previous column articles can be found here:
-
0xSero (@0xSero) reported@selim_aktas2 Why isn’t rebench more common on model cards? I have used terminal-bench-2.0 as the golden standard since I liked the range of questions on it. If I’m not mistaken rebench rotates the questions based on latest GitHub issues right?
-
小乔不带伞|| Make money forever || (@xiaoqiao6666666) reported4RdPweUWkqt7oqSZY6gah7ktH7bKmFaomGYfkpPdpump @sharbel actually created an AI girlfriend named Sophia—solving a major problem. Installation is completely free; all fees go to the GitHub repository to support Sophia's development.
-
Von Aeternus (@vonaeternus) reported@SunWeatherMan No, the modeling is fundamentally broken if you look at the source data and github repo, etc.
-
👨🏻💻 ⚡️ (@EadrictheWild) reportedWhy is GitHub so slow? one little pull request took about 4 minutes?
-
BlockedPath (@BlockedPaths) reported@stackzz four minutes from Claude is down to running a random proxy off a 3-star GitHub repo. traders routing real money through sketchy free proxies is peak operator brainrot
-
Orlixx.ai (@orlixx003) reportedTHIS CHINESE GUY AUTOMATED PCB DESIGN USING CLAUDE AND HE’S GENERATING FULL SCHEMATICS FOR $0 PRODUCTION COST No manual routing. No component searching. No complex CAD skills. He connects Claude to EasyEDA using a single terminal command. The AI does the rest. Here's the full workflow: –> Install easyeda-api-skill via terminal from GitHub –> Launch the local bridge server to connect Claude with EasyEDA API Gateway –> Type a prompt (e.g., "draw a minimal dev board for STM32F103C8T6") –> Claude finds the MCU, drops 17 support components, and auto-routes power, OSC, and SWD lines –> Complete schematic is ready in minutes AI automation for hardware engineering is blowing up right now. Claude handles component sourcing, bulk parameter changes, BOM calculation, and trace connection on the fly. From a blank canvas to a fully wired circuit. Free tools. Zero manual hassle. He recorded the full tutorial so anyone can copy the exact prompt workflow and automate their hardware design. Bookmark this post. Everything is in the video below.
-
top10.dev (@Top10_Dev) reportedSo your only edge is visible work: your GitHub graph, your ship rate, your taste in which problems matter. That's all that's left. That's your resume.
-
Luke | 192 pulls for Himeko Nova !! (@iloveovervape) reportedwhy is github down omfggggg @microsoft die
-
Xiaoyu | Lumi (@xiaoyvLiu) reported@sunnykgupta honestly the PR review loop is the key part you could bash script it but might be overkill - maybe just alias the codex → PR → review → fix sequence? or hook it into a GitHub Action that triggers copilot review on draft PRs automatically what slows you down most rn, the context switching between tools or waiting on reviews?
-
The Cloaked Gaze 👀 (@gaze_observer) reportedEnterprise AI Adoption Low Due to High Token Usage and Low ROI: Cognizant CEO; Ravi Kumar Says FOMO-Driven Token Consumption Without Linkage to Outcomes Is the Core Problem The Core Diagnosis — Why Enterprise AI Adoption Is Lagging Big gap between what AI can do in enterprises and a company's actual AI adoption rate Due to high token consumption over the last few years without linking it to ROI Enterprise AI adoption remains low despite frontier model companies spending billions on LLMs Nvidia, Meta, Google and Amazon have already announced investments worth almost $700 billion this year Yet enterprise adoption revolves around only productivity and efficiency gains — not production value The FOMO Problem — Token Consumption Without Outcomes "There's been a sense of FOMO (fear of missing out), fear mongering" — led to token consumption without linkages to ROI or outcomes One key reason for the capability-production gap: "relentless token consumption without linkage to outcomes" — Ravi Kumar, Cognizant CEO Higher token consumption has become the new point of discussion with many companies reporting they have burnt their annual AI budgets in a shorter time without noticing any significant change in productivity Real-World Evidence — Companies Pulling Back Microsoft reportedly began telling employees to wind down usage of Claude Code and shift to its GitHub Copilot CLI Uber limited its spending on AI-powered coding tools to manage costs Companies already talking about AI "with very little productivity" "Costs are ballooning with very little productivity. In some ways, that's the gap we are going to address as a company" — Kumar to analysts The IT Services Opportunity — Where the Value Actually Goes Revenue potential of frontier model companies can touch a trillion dollars in the next four years — creates greater opportunities for IT services firms "A part of it is actually going to be routed through system integrators or AI builders" IT services firms needed because: contextual science requires creating more efficient, more effective, more predictable and better economics for token consumption Orchestrating workflows in enterprises for maximum AI benefit is "notoriously tough" — has prompted LLM makers to create their own services companies The 'Magic Plug-In' Assumption Is Wrong Kumar's long-held view: assumption that new AI tools can be plugged into enterprise environments and immediately replace large parts of IT services work is misplaced "A tool or a technology would be plugged into an enterprise landscape, and magically, there will be output coming out of it. If that's the case, why hasn't that value drifted into enterprises over the last three years since OpenAI launched ChatGPT?" "The reality is that the value is actually still sitting with infrastructure and not drifting to enterprises" Core Theme Cognizant CEO Ravi Kumar's diagnosis of enterprise AI adoption cuts through the hype with a precision that the industry needed to hear — the problem is not the capability of AI models but the absence of outcome linkage in how enterprises are consuming tokens; three years after ChatGPT launched a trillion-dollar investment wave, the value has not drifted into enterprise productivity because deploying AI in complex, contextual enterprise environments requires the exact integration, orchestration and workflow expertise that IT services firms provide, and the FOMO-driven token consumption that has burned AI budgets without productivity gains is ultimately a deployment problem, not a technology problem — making Cognizant's position as a system integrator and AI builder not a threatened legacy business but the essential bridge between frontier model capability and enterprise production value.
-
David Cramer (@zeeg) reportedI'm sure this will backfire, but my clever solution for some determnism around GitHub behavior: 1. *** hooks for binding Co-Authored-By using implicit environment context 2. network egress interception to escalate write operations to a user token (vs an app token) (2) is where my graphql gripes come from, as it requires me to actually inspect the response (which for a lot of reasons isnt great) Overall it seems to be working well. The agent can query a ton of private data using its app installation token, and then when a user asks to create an issue or a pull request it invokes credential grants bound to the user token. That forces authorship on the end-user and generlaly just creates a better experience. The *** hooks themselves are a best-effort. The model can def override them, but I wanted to avoid trying to intercept HTTP and rewriting request payloads (for now).
-
AI Theory (@AItheoryx) reported🚨 YOUTUBE ALGORITHM KEEPS FEEDING YOU TRASH. THIS GITHUB PROJECT LETS YOU DELETE THE ALGORITHM ENTIRELY. Meet TubeArchivist. Your self-hosted YouTube media server. Subscribe to channels. Download every video automatically using yt-dlp. Index everything with metadata. Search, play, track watched status. Full control. No ads. No recommendations. Just the content you actually want. Plex and Jellyfin plugins included. Browser extension to grab videos with one click. Built on Docker. Runs on Unraid, Synology, anything. The question YouTube does not want you to ask. When you can archive every creator you follow and watch offline forever, why are you still letting an algorithm decide what you see. TubeArchivist. Because your attention should belong to you. 🧾
-
shinyufoguy2222 (@ollobrains) reportedMicrosoft’s MAI launch has a data-provenance problem, and enterprise buyers should require Microsoft to reconcile its marketing language with its own technical report. That is colder, harder to rebut, and more dangerous. 1. The strongest factual spine The core contradiction you want to exploit is real enough to be powerful, but it needs to be framed precisely. Microsoft’s public MAI materials say MAI-Thinking-1 was trained “from the ground up” on enterprise-grade, clean and commercially licensed data, and the Build transcript uses the phrase “enterprise-grade, clean and commercially licenced data lineage.” Microsoft also announced a family of seven MAI models developed in-house. But the MAI-Thinking-1 technical report describes pretraining on a mixture of publicly available and licensed human-generated data, including web data, public GitHub code, books, academic papers, news, multilingual text, and domain-specific materials. The same report says the data pipeline includes a proprietary crawl and Common Crawl, and the appendix says Microsoft’s proprietary crawl started from 1.2 trillion pages, later reduced by filtering, while the Common Crawl pipeline contributed 24.2 billion pages after processing. That is the strongest wording: Microsoft marketed the model around clean, enterprise-grade, commercially licensed data lineage. Its own paper describes a broader training mix that includes massive public-web crawling and Common Crawl. Microsoft needs to explain what, exactly, “commercially licensed” means here. That is much better than “they lied,” because it forces them into a definitional trap. 2. Do not make the whole argument depend on “lying” “Caught lying” is viral, but it gives Microsoft an escape hatch. They can say: “We did not lie. ‘Commercially licensed data lineage’ refers to commercially licensed datasets in the blend, not every token.” Or: “Publicly available web data was collected according to terms of use, robots controls, and industry standards.” Or: “Clean means filtered, deduplicated, non-pirated, non-adult, non-synthetic, and quality controlled — not necessarily individually licensed from every author.” So the sharper accusation is: Microsoft used procurement-grade language that sounds like full rights-cleared data, while its technical report describes a training stack that includes large-scale public-web data. That ambiguity matters for banks, hospitals, insurers, government agencies, and any customer relying on data-provenance representations. That is the kill shot: not “you lied,” but “your enterprise claim is materially ambiguous.” 3. Best replacement headline options Use one of these: Microsoft’s new MAI models have a data-provenance problem. Microsoft told enterprises “commercially licensed data.” Its own paper says public web crawl and Common Crawl. The real Microsoft AI story is not benchmarks. It is data lineage. Microsoft’s MAI launch just created a procurement problem for every regulated enterprise buyer. What does “commercially licensed” mean when the technical report says Common Crawl? The most surgical version: Microsoft needs to define “commercially licensed data lineage” before regulated enterprises rely on it. That sounds less like outrage and more like a subpoena. 4. Important correction: be careful saying Satya personally said it Your draft says: “At Build 2026, Satya Nadella announced…” The safer version is: At Build 2026, Microsoft announced… Or: During Microsoft’s MAI keynote at Build 2026, Microsoft described the models as having enterprise-grade, clean and commercially licensed data lineage. The Build transcript text I found includes the relevant “commercially licenced data lineage” claim in the MAI keynote, and the transcript also says “Satya just mentioned,” which suggests you should not hinge the argument on Satya personally unless you have the exact clip of him saying the phrase. That matters because if one attribution is wrong, people will use it to attack the entire post. 5. The real issue: “clean” is not the same as “licensed” This is one of the biggest missing elements. There are at least four separate claims being blurred together: Clean can mean filtered for spam, porn, piracy domains, malware, boilerplate, duplicate pages, low-quality content, synthetic content, or personally sensitive data. Commercially licensed means someone has a legal right, contractual permission, or license to use the content commercially for training. Traceable means Microsoft can identify source categories, URLs, datasets, providers, or pipeline lineage. Enterprise-grade means the data pipeline is controlled, documented, audited, filtered, and suitable for business buyers. Those are not the same thing. Your post should say: Clean data is not automatically licensed data. Traceable data is not automatically rights-cleared data. Publicly available data is not automatically commercially licensed data. That line is devastating. 6. Add this phrase: “traceable is not licensed” This is the obscure but important thought input. A dataset can be traceable to a URL and still not be commercially licensed. A crawler can respect robots.txt and still not create a negotiated license with the author. A page can be publicly accessible and still contain copyrighted work. A corpus can exclude adult content and piracy domains but still contain protected text, images, code, PDFs, journalism, forums, documentation, and books. Suggested line: Microsoft may have built a cleaner web crawl. That is not the same as building a fully commercially licensed corpus. Another version: The question is not whether Microsoft filtered the web. The question is whether Microsoft had commercial training rights for the web it kept. 7. The Common Crawl point needs nuance Your draft says Common Crawl has: “ZERO licensing guarantees and ZERO author consent mechanisms.” That is rhetorically strong, but tighten it. Better: Common Crawl is a free, open archive of web-crawled data. It does not, by itself, provide a source-by-source commercial license from every author or publisher whose content appears on the open web. Common Crawl describes itself as a free, open repository of web crawl data, and its own materials emphasize that it is a massive open corpus used by many AI researchers and companies. Also be careful saying: “Common Crawl is being sued.” A more accurate version is: Common Crawl and Common-Crawl-derived data have become a recurring flashpoint in AI copyright disputes, takedown demands, and lawsuits involving AI training data. That is safer. Wired reported publisher pressure on Common Crawl and noted that the New York Times had made removal requests before suing OpenAI, while other recent lawsuits have alleged use of Common Crawl or Common-Crawl-derived material as part of AI training disputes. 8. The best “receipt” structure Your post should have a clean evidence ladder: Microsoft’s claim: clean, enterprise-grade, commercially licensed data lineage. Microsoft’s paper: mixture of publicly available and licensed data. Microsoft’s appendix: proprietary crawl, Common Crawl, web-crawled PDFs, public GitHub. Enterprise question: which parts were actually commercially licensed, and which parts were merely publicly available? That is the entire argument. You do not need to prove Microsoft committed fraud. You need to force the question: Was “commercially licensed” a claim about the entire corpus, or only part of it? 9. The technical report gives you more ammunition Do not stop at proprietary crawl and Common Crawl. The MAI report also describes a web-crawled PDF corpus and a large public GitHub code corpus. The appendix says the web-crawled PDF pipeline starts from about 10 billion documents, filtered to about 620 million, and the report also describes a 7.4 trillion-token public GitHub code corpus. That matters because enterprise risk is not just “web pages.” It is: web pages PDFs public code academic material news books journals domain-specific materials multilingual web content possibly opt-out-protected or rights-reserved content Suggested line: This is not a tiny footnote. The paper describes a full industrial-scale public-data ingestion machine. 10. The Simon Willison angle is useful, but do not over-center it Simon Willison is useful as the “someone actually read the paper” character. He publicly highlighted the same issue: Microsoft’s marketing around “appropriately licensed” data versus the report’s description of public web crawl and Common Crawl. But the strongest post should not depend on Simon. Make him the discovery beat, not the proof: The funny part is that the contradiction was not hidden. It was sitting in Microsoft’s own technical report. Simon Willison simply did what enterprise procurement teams should have done: read the data section. That line is excellent. 11. Add the “enterprise procurement” frame This is the most important strategic upgrade. The issue is not whether Microsoft can win a copyright lawsuit. The issue is whether enterprise customers can rely on Microsoft’s claims in regulated procurement. A bank, hospital, insurer, defense contractor, or government agency does not ask only: “Can Microsoft argue fair use?” They ask: “Can Microsoft represent and warrant the training data provenance?” “Can Microsoft indemnify us?” “Can Microsoft disclose source categories?” “Can Microsoft honor opt-outs and rights reservations?” “Can Microsoft survive an audit?” “Can we put this in front of regulators, customers, and internal risk committees?” Suggested line: In regulated enterprise sales, “probably defensible in court” is not the same as “clean enough for procurement.” That is a killer sentence. 12. Add “Model Data Bill of Materials” This is the genius-level solution. Borrow from software supply chain language. Enterprises already understand SBOMs — Software Bills of Materials. AI now needs a Model Data Bill of Materials, or MDBOM. Suggested paragraph: If Microsoft wants to sell frontier models into regulated industries, it should provide a Model Data Bill of Materials: source categories, token share by source family, acquisition method, license basis, opt-out handling, robots/rights-reservation handling, third-party provider categories, public-code license treatment, jurisdictional restrictions, audit process, and indemnity scope. That reframes the post from outrage to standards-setting. You are not just complaining. You are proposing the new enterprise AI procurement checklist. 13. Ask for percentages The missing question is not: “Did you use Common Crawl?” The missing question is: What percentage of the final training tokens were commercially licensed versus publicly available? Ask Microsoft to publish: percentage from proprietary web crawl percentage from Common Crawl percentage from web-crawled PDFs percentage from public GitHub percentage from books and journals percentage from news percentage from academic papers percentage from commercially negotiated third-party providers percentage from user/customer data, if any percentage excluded due to rights reservations, robots.txt, noai tags, or publisher opt-outs That is the exact pressure point. 14. Add “commercially licensed” definition demand This should be central: Microsoft needs to define whether “commercially licensed data lineage” means:100% of training tokens were commercially licensed; only third-party purchased datasets were commercially licensed; publicly available web data was treated as appropriately usable under site terms, robots controls, or fair-use theory; only the data pipeline, not the underlying works, was commercially controlled; or the phrase was marketing shorthand, not a source-level warranty. That list is brutal because Microsoft has to pick one. 15. Add “lineage laundering” This is an obscure but powerful concept. Use this carefully: The danger here is data-lineage laundering: taking messy public-web material, running it through proprietary crawlers, filters, deduplication, embeddings, and safety screens, then describing the resulting pipeline as enterprise-grade without clearly saying which underlying works were actually licensed. Or shorter: Processing data in-house does not magically license the underlying works. That line is excellent. 16. Add “in-house” is not the same as “rights-cleared” Microsoft’s “built in-house” positioning is important because it helps them show independence from OpenAI. But “in-house” answers a different question. It answers: Did Microsoft build the model itself? It does not answer: Did Microsoft have commercial training rights for every work in the corpus? Suggested line: “Built in-house” is a model-lineage claim. “Commercially licensed” is a data-rights claim. Microsoft is blending the emotional force of the first with the procurement value of the second. That is a high-quality thought. 17. Add “zero distillation” is a separate axis Microsoft emphasizes MAI-Thinking-1 was trained without distillation from third-party models. That matters because enterprise buyers worry about models inheriting another model’s outputs or hidden IP. But zero distillation does not solve web-data provenance. Suggested line: Zero distillation may answer “did you copy another model’s behavior?” It does not answer “were the underlying training works commercially licensed?” This is a very strong missing distinction. 18. Strengthen the DeepSeek angle Your draft says the DeepSeek scandal made compliance departments paranoid about where AI training data came from. Make it more precise: The DeepSeek controversy sharpened enterprise sensitivity around model lineage, unauthorized distillation, and IP contamination. It made provenance a boardroom issue, not just a research issue. Reuters reported that Microsoft and OpenAI were probing whether a DeepSeek-linked group improperly obtained OpenAI data, and later reported on U.S. government concern about alleged unauthorized use and distillation by DeepSeek and other Chinese AI firms. Better line: DeepSeek made “where did the intelligence come from?” a procurement question. Microsoft then marketed MAI around exactly that fear. 19. Use the OpenAI partnership as context, not motive Your draft says Microsoft was “desperate” to break free from OpenAI and “rushed” models to market. That is spicy, but not provable unless you have internal evidence. Use this instead: The timing matters. Microsoft and OpenAI’s April 2026 partnership update made Microsoft’s OpenAI license non-exclusive, ended Microsoft’s revenue-sharing payments to OpenAI, and allowed OpenAI to serve products across other cloud providers. That gave Microsoft a clear strategic incentive to prove it could ship first-party models. Then: That does not prove bad faith. But it does explain why the “built in-house on clean, commercially licensed data” story was so valuable. This is much stronger. 20. The EU AI Act point is good, but make it sharper Your EU point is valid, but refine it. The EU AI Act’s Article 53 requires providers of general-purpose AI models to maintain technical documentation, put in place a policy to comply with EU copyright law including rights reservations, and publish a sufficiently detailed summary of the content used for training using an AI Office template. The European Commission has also published a training-data summary template and says the GPAI Code of Practice helps providers demonstrate compliance with Article 53 obligations around transparency and copyright. Suggested line: Europe is where vague provenance language turns into paperwork. Even stronger: In the U.S., Microsoft can fight this as a marketing and litigation-risk issue. In the EU, the question becomes: what exactly goes into the Article 53 training-data summary and copyright compliance policy? 21. Add the “enterprise warranty” angle The most important hidden issue is not the blog post. It is the contract. Suggested paragraph: The question for every enterprise buyer is whether Microsoft’s “clean and commercially licensed” language appears only in launch materials, or whether it is incorporated into the Master Services Agreement, procurement response, model documentation, indemnity terms, risk disclosures, or regulatory submissions. That is the procurement bomb. If it is only marketing copy, buyers were sold vibes. If it is contractual, Microsoft may owe precise representations. Suggested line: Marketing language can be slippery. Contract language cannot. 22. Add this procurement checklist This is what banks, hospitals, insurers, and government agencies should ask Microsoft before deploying MAI models: Define “commercially licensed data lineage.” State whether the claim applies to 100% of training tokens. Break down training-token percentages by source category. Identify which categories were commercially licensed. Identify which categories were merely publicly available. Disclose whether Common Crawl was used directly, indirectly, or after filtering. Disclose which Common Crawl snapshots were used. Explain the legal basis for proprietary web-crawled content. Explain how robots.txt, paywalls, login walls, noai tags, and TDM reservations were handled. Explain whether rights-holder opt-outs were honored before, during, and after training. Explain whether source websites’ terms of service were reviewed or categorized. Explain the treatment of public GitHub licenses, including copyleft and attribution obligations. Explain how web-crawled PDFs were screened for copyrighted books, journals, reports, and paywalled documents. Explain what “piracy filtering” means and whether it is domain-based, content-based, hash-based, or provider-list-based. Explain whether Microsoft can remove a source from future training runs. Explain whether Microsoft can identify if a specific publisher, website, repository, or author appears in the corpus. Provide the scope of Microsoft’s indemnity for training-data copyright claims. Provide an Article 53-style training-data summary for EU deployments. Provide third-party audit or attestation of data-source claims. Put all of the above into the contract, not a blog post. That list turns your post into something procurement teams can actually use. 23. Add a data-source risk table This would make the post feel more concrete: Source categoryWhat the paper indicatesEnterprise questionProprietary web crawlStarted from 1.2T pages, filtered down substantiallyWere these pages commercially licensed, or publicly accessible?Common Crawl24.2B processed pagesWhich snapshots, what legal basis, what opt-outs?Web-crawled PDFsStarted from about 10B documents, filtered to 620MWere reports, journals, books, manuals, and paywalled PDFs excluded?Public GitHub7.4T-token code corpusHow were licenses, attribution, copyleft, and code provenance handled?Books/journalsAcquired from providers, including publisher agreementsWhich categories were truly licensed, and under what usage rights? The technical basis for that table is in Microsoft’s own report. 24. Add “public web is not a license class” This is a great phrase: Public web is an access category, not a license category. Follow with: A browser can access a page. That does not mean a model vendor has a commercial training license to ingest, tokenize, transform, and monetize that work. That is one of the strongest conceptual additions. 25. Add “robots.txt is not consent” Microsoft’s report says publicly available data was collected in a way that considered site terms, industry standards, and web controls such as robots.txt and meta tags. That is useful, but it does not settle the enterprise question. Suggested line: Robots.txt is a crawler instruction mechanism. It is not the same thing as a negotiated commercial license from every rights holder. Be careful: do not say this as a definitive legal conclusion for every jurisdiction. Frame it as a procurement distinction. 26. Add “AI-generated content exclusion does not solve copyright” Microsoft says AI-generated content was excluded from pretraining and frames this as important for quality, provenance, and control. Good missing line: Excluding AI-generated content may reduce model-collapse and synthetic-data contamination risk. It does not automatically clear rights in the remaining human-authored web. Even punchier: Clean of synthetic sludge does not mean clean of copyright ambiguity. 27. Add “the real scandal is definitional” This is probably the best intellectual framing. The scandal is not that Microsoft used web data. Everyone suspects frontier labs use web data. The scandal is that Microsoft appears to have used enterprise-procurement language that ordinary buyers could reasonably interpret as rights-cleared data, while the technical report describes a massive public-web component. That is balanced and sharp. 28. Add “MAI-Thinking-1 vs all seven models” Do not overextend the technical paper. The strongest evidence is for MAI-Thinking-1 / MAI-Base-1, because that is the technical report with the data details. Microsoft’s broader MAI announcement says the seven models share clean, enterprise-grade data lineage, but the most specific crawl/Common Crawl details you are citing are from the MAI-Thinking-1 report. Suggested wording: The clearest contradiction appears in the MAI-Thinking-1 technical report. If Microsoft says the same data-lineage claim applies across the whole seven-model family, then it should publish equivalent data-source summaries for each model. That avoids overclaiming. 29. Add “do not say every Fortune 500 company wrote checks” Your draft says: “Procurement teams across Wall Street and Washington heard ‘clean and commercially licensed’ and started writing checks.” Unless you have evidence of contracts closed after that keynote, say: “That claim was clearly aimed at enterprise procurement teams across finance, healthcare, insurance, and government.” Or: “That is exactly the kind of claim regulated buyers use to clear vendor-risk reviews.” That preserves the point without inventing buyer behavior. 30. Add “do not say Microsoft has issued no statement” unless you are tracking it This line is risky: “As of today, Microsoft hasn’t issued a single public statement…” Instead say: “Microsoft should publicly reconcile the launch claim with the technical report.” That keeps the pressure without making a fragile negative claim.
-
Ridark (@ridark_eth) reportedTHIS CHINESE GUY AUTOMATED PCB DESIGN USING CLAUDE AND HE’S GENERATING FULL SCHEMATICS FOR $0 PRODUCTION COST No manual routing. No component searching. No complex CAD skills. He connects Claude to EasyEDA using a single terminal command. The AI does the rest. Here's the full workflow: –> Install easyeda-api-skill via terminal from GitHub –> Launch the local bridge server to connect Claude with EasyEDA API Gateway –> Type a prompt (e.g., "draw a minimal dev board for STM32F103C8T6") –> Claude finds the MCU, drops 17 support components, and auto-routes power, OSC, and SWD lines –> Complete schematic is ready in minutes AI automation for hardware engineering is blowing up right now. Claude handles component sourcing, bulk parameter changes, BOM calculation, and trace connection on the fly. From a blank canvas to a fully wired circuit. Free tools. Zero manual hassle. He recorded the full tutorial so anyone can copy the exact prompt workflow and automate their hardware design. Bookmark this post. Everything is in the video below.
-
BannedLatino (@BannedLatino) reported@LexnLin I wonder is normies are really understanding wtf are benchmark. In particular that benchmark is using western harnesses to solve western problems from GitHub (blocked in China). Harnesses are not part of the AI models, can influence a lot in the results, this is stpid af.
-
Xah Lee (@xah_lee) reportedthe problem with github is that it only measure open source code. code with lots value, are often not shared. e.g. banking , stocks, google search, microsoft windows, big physics, engineering mega machines, nukes, etc.
-
Shantun Singh Parmar (@ParmarShantun) reportedHot take: Your github contribution graph means nothing. Your ability to sit with a broken production system at 11PM, stay calm, debug systematically and not blame your teammates. That's the skill that actually matters.
-
Josh Ellithorpe (@zquestz) reportedEveryone is getting owned. Microsoft shuts down 70+ GitHub repos due to malware. Sadly I think these exploits will get significantly worse before quality solutions emerge.
-
Bhaumin 🧑🏻💻 (@beingminimal) reported@pierceboggan why we are not able to edit issues in the GitHub copilot app? Is it connected with any plan or available for everyone?
-
Avi Chawla (@_avichawla) reportedClaude Code without this new tool is like *** without GitHub. Claude Code stops at the boundary of your terminal. - It can't see what's happening in production right now. - It doesn't know which PR broke the checkout service. - It can't tell why a Datadog alert got fired. - It can't see the Slack thread where the team decided not to touch the retry logic. These are operational and institutional memory gaps that eat up engineering time every single week. The solution is now actually implemented into the @coderabbitai Agent. It lives inside Slack and connects to repos, issue trackers, docs, monitoring, and cloud infra. When a production alert fires, you can mention it in the thread, and it traces the problem through your APM data, finds which recent PR caused it, and can open a targeted fix without anyone switching between five different dashboards. When the incident is resolved, it can document what happened and create a ticket in Linear with the timeline, root cause, and relevant PR links. Note that this is not a one-off assistant. The agent retains what the team decided across threads, channels, and the entire org. So the context from this incident is already available next time someone touches the same service. I've shared the link to try CodeRabbit Agent for free in the replies. Thanks to CodeRabbit for working with me on this post.
-
Ryan J. Shaw (@RyanJamesShaw) reported@alexanderrX_ @ThePrimeagen It comes from having thousands of open issues in GitHub, and throwing tokens at the issues rather than stopping to ask how they got to have thousands of open issues in GitHub.
-
Aceman67 (@aceman67) reported@Vaporwave_07 After going through a few issues in their Github, seems you're not alone, and the problem is likely from Libre HW Monitor being updated back in April.