Browse by signal
Fast keyword tagging derived from titles and summaries. Expect more nuance as we add model-assisted tagging.
Top results tagged #research

Industry leaders agree collaboration is key to advancing critical technologies.
We investigate the scaling behaviour of multimodal transformers across data regimes.

Trail of Bits is publicly disclosing two vulnerabilities in elliptic , a widely used JavaScript library for elliptic curve cryptography that is downloaded over 10 million times weekly and is used by close to 3,000 projects. These vulnerabilities, caused by missing modular reductions and a missing length check, could allow attackers to forge signatures or prevent valid signatures from being verified, respectively. One vulnerability is still not fixed after a 90-day disclosure window that ended in October 2024. It remains unaddressed as of this publication. I discovered these vulnerabilities using Wycheproof , a collection of test vectors designed to test various cryptographic algorithms against known vulnerabilities. If you’d like to learn more about how to use Wycheproof, check out this guide I published . In this blog post, I’ll describe how I used Wycheproof to test the elliptic library, how the vulnerabilities I discovered work, and how they can enable signature forgery or prevent signature verification. Methodology During my internship at Trail of Bits, I wrote a detailed guide on using Wycheproof for the new cryptographic testing chapter of the Testing Handbook . I decided to use the elliptic library as a real-world case study for this guide, which allowed me to discover the vulnerabilities in question. I wrote a Wycheproof testing harness for the elliptic package, as described in the guide. I then analyzed the source code covered by the various failing test cases provided by Wycheproof to classify them as false positives or real findings. With an understanding of why these test cases were failing, I then wrote proof-of-concept code for each bug. After confirming they were real findings, I began the coordinated disclosure process. Findings In total, I identified five vulnerabilities, resulting in five CVEs. Three of the vulnerabilities were minor parsing issues. I disclosed those issues in a public pull request against the repository and subsequently requested CVE IDs to keep track of them. Two of the issues were more severe. I disclosed them privately using the GitHub advisory feature. Here are some details on these vulnerabilities. CVE-2024-48949: EdDSA signature malleability This issue stems from a missing out-of-bounds check, which is specified in the NIST FIPS 186-5 in section 7.8.2, “HashEdDSA Signature Verification”: Decode the first half of the signature as a point R and the second half of the signature as an integer s . Verify that the integer s is in the range of 0 ≤ s 0 ) msg = msg . ushrn ( delta ); ... }; The delta variable calculates the difference between the size of the hash and the order n of the current generator for the curve. If msg occupies more bits than n , it is shifted by the difference. For this specific test case, we use secp192r1, which uses 192 bits, and SHA-256, which uses 256 bits. The hash should be shifted by 64 bits to the right to retain the leftmost 192 bits. The issue in the elliptic library arises because the new BN(msg, 16) conversion removes leading zeros, resulting in a smaller hash that takes up fewer bytes. 690ed426ccf17803ebe2bd0884bcd58a1bb5e7477ead3645f356e7a9 During the delta calculation, msg.byteLength() then returns 28 bytes instead of 32. EC . prototype . _truncateToN = function _truncateToN ( msg , truncOnly ) { var delta = msg . byteLength () * 8 - this . n . bitLength (); ... }; This miscalculation results in an incorrect delta of 32 = (288 - 192) instead of 64 = (328 - 192) . Consequently, the hashed message is not shifted correctly, causing verification to fail. This issue causes valid signatures to be rejected if the message hash contains enough leading zeros, with a probability of 2 -32 . To fix this issue, an additional argument should be added to the verification function to allow the hash size to be parsed: EC . prototype . verify = function verify ( msg , signature , key , enc , msgSize ) { msg = this . _truncateToN ( new BN ( msg , 16 ), undefined , msgSize ); ... } EC . prototype . _truncateToN = function _truncateToN ( msg , truncOnly , msgSize ) { var size = ( typeof msgSize === 'undefined' ) ? ( msg . byteLength () * 8 ) : msgSize ; var delta = size - this . n . bitLength (); ... }; On the importance of continuous testing These vulnerabilities serve as an example of why continuous testing is crucial for ensuring the security and correctness of widely used cryptographic tools. In particular, Wycheproof and other actively maintained sets of cryptographic test vectors are excellent tools for ensuring high-quality cryptography libraries. We recommend including these test vectors (and any other relevant ones) in your CI/CD pipeline so that they are rerun whenever a code change is made. This will ensure that your library is resilient against these specific cryptographic issues both now and in the future. Coordinated disclosure timeline For the disclosure process, we used GitHub’s integrated security advisory feature to privately disclose the vulnerabilities and used the report template as a template for the report structure. July 9, 2024: We discovered failed test vectors during our run of Wycheproof against the elliptic library. July 10, 2024: We confirmed that both the ECDSA and EdDSA module had issues and wrote proof-of-concept scripts and fixes to remedy them. For CVE-2024-48949 July 16, 2024: We disclosed the EdDSA signature malleability issue using the GitHub security advisory feature to the elliptic library maintainers and created a private pull request containing our proposed fix. July 16, 2024: The elliptic library maintainers confirmed the existence of the EdDSA issue, merged our proposed fix , and created a new version without disclosing the issue publicly. Oct 10, 2024: We requested a CVE ID from MITRE. Oct 15, 2024: As 90 days had elapsed since our private disclosure, this vulnerability became public. For CVE-2024-48948 July 17, 2024: We disclosed the ECDSA signature verification issue using the GitHub security advisory feature to the elliptic library maintainers and created a private pull request containing our proposed fix. July 23, 2024: We reached out to add an additional collaborator to the ECDSA GitHub advisory, but we received no response. Aug 5, 2024: We reached out asking for confirmation of the ECDSA issue and again requested to add an additional collaborator to the GitHub advisory. We received no response. Aug 14, 2024: We again reached out asking for confirmation of the ECDSA issue and again requested to add an additional collaborator to the GitHub advisory. We received no response. Oct 10, 2024: We requested a CVE ID from MITRE. Oct 13, 2024: Wycheproof test developer Daniel Bleichenbacher independently discovered and disclosed issue #321 , which is related to this discovery. Oct 15, 2024: As 90 days had elapsed since our private disclosure, this vulnerability became public.

Meta on Tuesday said it has made available a tool called WhatsApp Research Proxy to some of its long-time bug bounty researchers to help improve the program and more effectively research the messaging platform's network protocol. The idea is to make it easier to delve into WhatsApp-specific technologies as the application continues to be a lucrative attack surface for state-sponsored actors and

In what appeared to be a bid to soak up some of Google's limelight prior to the launch of its new Gemini 3 flagship AI model — now recorded as the most powerful LLM in the world by multiple independent evaluators — Elon Musk's rival AI startup xAI last night unveiled its newest large language model, Grok 4.1. The model is now live for consumer use on Grok.com, social network X (formerly Twitter), and the company’s iOS and Android mobile apps, and it arrives with major architectural and usability enhancements, among them: faster reasoning, improved emotional intelligence, and significantly reduced hallucination rates. xAI also commendably published a white paper on its evaluations and including a small bit on training process here . Across public benchmarks, Grok 4.1 has vaulted to the top of the leaderboard, outperforming rival models from Anthropic, OpenAI, and Google — at least, Google's pre-Gemini 3 model (Gemini 2.5 Pro). It builds upon the success of xAI's Grok-4 Fast, which VentureBeat covered favorably shortly following its release back in September 2025. However, enterprise developers looking to integrate the new and improved model Grok 4.1 into production environments will find one major constraint: it's not yet available through xAI’s public API . Despite its high benchmarks, Grok 4.1 remains confined to xAI’s consumer-facing interfaces, with no announced timeline for API exposure. At present, only older models—including Grok 4 Fast (reasoning and non-reasoning variants), Grok 4 0709, and legacy models such as Grok 3, Grok 3 Mini, and Grok 2 Vision—are available for programmatic use via the xAI developer API. These support up to 2 million tokens of context, with token pricing ranging from $0.20 to $3.00 per million depending on the configuration. For now, this limits Grok 4.1’s utility in enterprise workflows that rely on backend integration, fine-tuned agentic pipelines, or scalable internal tooling. While the consumer rollout positions Grok 4.1 as the most capable LLM in xAI’s portfolio, production deployments in enterprise environments remain on hold. Model Design and Deployment Strategy Grok 4.1 arrives in two configurations: a fast-response, low-latency mode for immediate replies, and a “thinking” mode that engages in multi-step reasoning before producing output. Both versions are live for end users and are selectable via the model picker in xAI’s apps. The two configurations differ not just in latency but also in how deeply the model processes prompts. Grok 4.1 Thinking leverages internal planning and deliberation mechanisms, while the standard version prioritizes speed. Despite the difference in architecture, both scored higher than any competing models in blind preference and benchmark testing. Leading the Field in Human and Expert Evaluation On the LMArena Text Arena leaderboard , Grok 4.1 Thinking briefly held the top position with a normalized Elo score of 1483 — then was dethroned a few hours later with Google's release of Gemini 3 and its incredible 1501 Elo score. The non-thinking version of Grok 4.1 also fares well on the index, however, at 1465. These scores place Grok 4.1 above Google’s Gemini 2.5 Pro, Anthropic’s Claude 4.5 series, and OpenAI’s GPT-4.5 preview. In creative writing, Grok 4.1 ranks second only to Polaris Alpha (an early GPT-5.1 variant), with the “thinking” model earning a score of 1721.9 on the Creative Writing v3 benchmark. This marks a roughly 600-point improvement over previous Grok iterations. Similarly, in the Arena Expert leaderboard, which aggregates feedback from professional reviewers, Grok 4.1 Thinking again leads the field with a score of 1510. The gains are especially notable given that Grok 4.1 was released only two months after Grok 4 Fast, highlighting the accelerated development pace at xAI. Core Improvements Over Previous Generations Technically, Grok 4.1 represents a significant leap in real-world usability. Visual capabilities—previously limited in Grok 4—have been upgraded to enable robust image and video understanding, including chart analysis and OCR-level text extraction. Multimodal reliability was a pain point in prior versions and has now been addressed. Token-level latency has been reduced by approximately 28 percent while preserving reasoning depth. In long-context tasks, Grok 4.1 maintains coherent output up to 1 million tokens, improving on Grok 4’s tendency to degrade past the 300,000 token mark. xAI has also improved the model's tool orchestration capabilities. Grok 4.1 can now plan and execute multiple external tools in parallel, reducing the number of interaction cycles required to complete multi-step queries. According to internal test logs, some research tasks that previously required four steps can now be completed in one or two. Other alignment improvements include better truth calibration—reducing the tendency to hedge or soften politically sensitive outputs—and more natural, human-like prosody in voice mode, with support for different speaking styles and accents. Safety and Adversarial Robustness As part of its risk management framework, xAI evaluated Grok 4.1 for refusal behavior, hallucination resistance, sycophancy, and dual-use safety. The hallucination rate in non-reasoning mode has dropped from 12.09 percent in Grok 4 Fast to just 4.22 percent — a roughly 65% improvement. The model also scored 2.97 percent on FActScore, a factual QA benchmark, down from 9.89 percent in earlier versions. In the domain of adversarial robustness, Grok 4.1 has been tested with prompt injection attacks, jailbreak prompts, and sensitive chemistry and biology queries. Safety filters showed low false negative rates, especially for restricted chemical knowledge (0.00 percent) and restricted biological queries (0.03 percent). The model’s ability to resist manipulation in persuasion benchmarks, such as MakeMeSay, also appears strong—it registered a 0 percent success rate as an attacker. Limited Enterprise Access via API Despite these gains, Grok 4.1 remains unavailable to enterprise users through xAI’s API. According to the company’s public documentation , the latest available models for developers are Grok 4 Fast (both reasoning and non-reasoning variants), each supporting up to 2 million tokens of context at pricing tiers ranging from $0.20 to $0.50 per million tokens. These are backed by a 4M tokens-per-minute throughput limit and 480 requests per minute (RPM) rate cap. By contrast, Grok 4.1 is accessible only through xAI’s consumer-facing properties—X, Grok.com, and the mobile apps. This means organizations cannot yet deploy Grok 4.1 via fine-tuned internal workflows, multi-agent chains, or real-time product integrations. Industry Reception and Next Steps The release has been met with strong public and industry feedback. Elon Musk, founder of xAI, posted a brief endorsement, calling it “a great model” and congratulating the team. AI benchmark platforms have praised the leap in usability and linguistic nuance. For enterprise customers, however, the picture is more mixed. Grok 4.1’s performance represents a breakthrough for general-purpose and creative tasks, but until API access is enabled, it will remain a consumer-first product with limited enterprise applicability. As competitive models from OpenAI, Google, and Anthropic continue to evolve, xAI’s next strategic move may hinge on when—and how—it opens Grok 4.1 to external developers.

By plugging tens of billions of phone numbers into WhatsApp’s contact discovery tool, researchers found “the most extensive exposure of phone numbers” ever—along with profile photos and more.

Cybersecurity researchers have disclosed details of a cyber attack targeting a major U.S.-based real-estate company that involved the use of a nascent command-and-control (C2) and red teaming framework known as Tuoni. "The campaign leveraged the emerging Tuoni C2 framework, a relatively new, command-and-control (C2) tool (with a free license) that delivers stealthy, in-memory payloads,"

Interviewer: Jillian York Benjamin Ismail is the Campaign and Advocacy Director for GreatFire , where he leads efforts to expose the censorship apparatus of authoritarian regimes worldwide. He also runs/oversees the App Censorship Project, including the AppleCensorship.com and GoogleCensorship.org platforms, which track mobile app censorship globally. From 2011 to 2017, Benjamin headed the Asia-Pacific desk at Reporters Without Borders (RSF). Jillian York : Hi Benjamin, it's great to chat with you. We got to meet at the Global Gathering recently and we did a short video there and it was wonderful to get to know you a little bit. I'm going to start by asking you my first basic question: What does free speech or free expression mean to you? Benjamin Ismail : Well, it starts with a very, very big question. What I have in mind is a cliche answer, but it's what I genuinely believe. I think about all freedoms. So when you say free expression, free speech, or freedom of information or Article 19, all of those concepts are linked together, I immediately think of all human rights at once. Because what I have seen during my current or past work is how that freedom is really the cornerstone of all freedom. If you don’t have that, you can’t have any other freedom. If you don’t have freedom of expression, if you don't have journalism, you don't have pluralism of opinions—you have self-censorship. You have realities, violations, that exist but are not talked about, and are not exposed, not revealed, not tackled, and nothing is really improved without that first freedom. I also think about Myanmar because I remember going there in 2012, when the country had just opened after the democratic revolution. We got the chance to meet with many officials, ministers, and we got to tell them that they should start with that because their speech was “don’t worry, don’t raise freedom of speech, freedom of the press will come in due time.” And we were saying “no, that’s not how it works!” It doesn’t come in due time when other things are being worked on. It starts with that so you can work on other things. And so I remember very well those meetings and how actually, unfortunately, the key issues that re-emerged afterwards in the country were precisely due to the fact that they failed to truly implement free speech protections when the country started opening. JY: What was your path to this work? BI : This is a multi-faceted answer. So, I was studying Chinese language and civilization at the National Institute of Oriental Languages and Civilizations in Paris along with political science and international law. When I started that line of study, I considered maybe becoming a diplomat…that program led to preparing for the exams required to enter the diplomatic corps in France. But I also heard negative feedback on the Ministry of Foreign Affairs and, notably, first-hand testimonies from friends and fellow students who had done internships there. I already knew that I had a little bit of an issue with authority. My experience as an assistant at Reporters Without Borders challenged the preconceptions I had about NGOs and civil society organizations in general. I was a bit lucky to come at a time when the organization was really trying to find its new direction, its new inspiration. So it a brief phase where the organization itself was hungry for new ideas. Being young and not very experienced, I was invited to share my inputs, my views—among many others of course. I saw that you can influence an organization’s direction, actions, and strategy, and see the materialization of those strategic choices. Such as launching a campaign, setting priorities, and deciding how to tackle issues like freedom of information, and the protection of journalists in various contexts. That really motivated me and I realized that I would have much less to say if I had joined an institution such as the Ministry of Foreign Affairs. Instead, I was part of a human-sized group, about thirty-plus employees working together in one big open space in Paris. After that experience I set my mind on joining the civil society sector, focusing on freedom of the press. on journalistic issues, you get to touch on many different issues in many different regions, and I really like that. So even though it’s kind of monothematic, it's a single topic that's encompassing everything at the same time. I was dealing with safety issues for Pakistani journalists threatened by the Taliban. At the same time I followed journalists pressured by corporations such as TEPCO and the government in Japan for covering nuclear issues. I got to touch on many topics through the work of the people we were defending and helping. That’s what really locked me onto this specific human right. I already had my interest when I was studying in political and civil rights, but after that first experience, at the end of 2010, I went to China and got called by Reporters Without Borders . They told me that the head of the Asia desk was leaving and invited me to apply for the position. At that time, I was in Shanghai, working to settle down there. The alternative was accepting a job that would take me back to Paris but likely close the door on any return to China. Once you start giving interviews to outlets like the BBC and CNN, well… you know how that goes—RSF was not viewed favorably in many countries. Eventually, I decided it was a huge opportunity, so I accepted the job and went back to Paris, and from then on I was fully committed to that issue. JY: For our readers, tell us what the timeline of this was. BI : I finished my studies in 2009. I did my internship with Reporters Without Borders that year and continued to work pro bono for the organization on the Chinese website in 2010. Then I went to China, and in January 2011, I was contacted by Reporters without Borders about the departure of the former head of the Asia Pacific Desk. I did my first and last fact-finding mission in China, and went to Beijing. I met the artist Ai Weiwei in Beijing just a few weeks before he was arrested, around March 2011, and finally flew back to Paris and started heading the Asia desk. I left the organization in 2017. JY: Such an amazing story. I’d love to hear more about the work that you do now. BI: The story of the work I do now actually starts in 2011. That was my first year heading the Asia Pacific Desk. That same year, a group of anonymous activists based in China started a group called GreatFire . They launched their project with a website where you can type any URL you want and that website will test the connection from mainland China to that URL and tell you know if it’s accessible or blocked. They also kept the test records so that you can look at the history of the blocking of a specific website, which is great. That was GreatFire’s first project for monitoring web censorship in mainland China. We started exchanging information, working on the issue of censorship in China. They continued to develop more projects which I tried to highlight as well . I also helped them to secure some funding. At the very beginning, they were working on these things as a side job. And progressively they managed to get some funding, which was very difficult because of the anonymity. One of the things I remember is that I helped them get some funding from the EU through a mechanism called “Small Grants”, where every grant would be around €20- 30,000. The EU, you know, is a bureaucratic entity and they were demanding some paperwork and documents. But I was telling them that they wouldn’t be able to get the real names of the people working at GreatFire, but that they should not be concerned about that because, what they wanted was to finance that tool. So if we were to show them that the people they were going to send the money to were actually the people controlling that website, then it would be fine. And so we featured a little EU logo just for one day, I think on the footer of the website so they could check that. And that’s how we convinced the EU to support GreatFire for that work. Also, there's this tactic called “ Collateral Freedom ” that GreatFire uses very well. The idea is that you host sensitive content on HTTPS servers that belong to companies which also operate inside China and are accessible there. Because it’s HTTPS, the connection is encrypted, so the authorities can’t just block a specific page—they can’t see exactly which page is being accessed. To block it, they’d have to block the entire service. Now, they can do that, but it comes at a higher political and economic cost, because it means disrupting access to other things hosted on that same service—like banks or major businesses. That’s why it’s called “collateral freedom”: you’re basically forcing the authorities to risk broader collateral damage if they want to censor your content. When I was working for RSF, I proposed that we replicate that tactic on the 12th of March—that's the World Day against Cyber Censorship . We had the habit of publishing what we called the “ enemies of the Internet ” report, where we would highlight and update the situation on the countries which were carrying out the harshest repression online; countries like Iran, Turkmenistan, North Korea, and of course, China. I suggested in a team meeting: “what if we highlighted the good guys? Maybe we could highlight 10 exiled media and use collateral freedom to uncensor those. And so we did: some Iranian media, Egyptian media, Chinese media, Turkmen media were uncensored using mirrors hosted on https servers owned by big, and thus harder to block, companies...and that’s how we started to do collateral freedom and it continued to be an annual thing. I also helped in my personal capacity, including after I left Reporters Without Borders. After I left RSF, I joined another NGO focusing on China, which I knew also from my time at RSF. I worked with that group for a year and a half; GreatFire contacted me to work on a website specifically. So here we are, at the beginning of 2020, they had just started this website called Applecensorship.com that allowed users to test availability of any app in any of Apple’s 175 App Stores worldwide They needed a better website—one that allowed advocacy content—for that tool. The idea was to make a website useful for academics doing research, journalists investigating app store censorship and control and human rights NGOs, civil society organizations interested in the availability of any tools. Apple’s censorship in China started quickly after the company entered the Chinese market, in 2010. In 2013, one of the projects by GreatFire which had been turned into an iOS app was removed by Apple 48 hours after its release on the App Store, at the demand of the Chinese authorities. That project was Free Weibo , which is a website which features censored posts from Weibo, the Chinese equivalent of Twitter—we crawl social media and detect censored posts and republish them on the site. In 2017 it was reported that Apple had removed all VPNs from the Chinese app store. So between that episode in 2013, and the growing censorship of Apple in China (and in other places too) led to the creation of AppleCensorship in 2019. GreatFire asked me to work on that website. The transformation into an advocacy platform was successful. I then started working full time on that project, which has since evolved into the App Censorship Project, which includes another website, googlecensorship.org (offering features similar to Applecensorship.com but for the 224 Play Stores worldwide). In the meantime, I became the head of campaigns and advocacy, because of my background at RSF. JY: I want to ask you, looking beyond China, what are some other places in the world that you're concerned about at the moment, whether on a professional basis, but also maybe just as a person. What are you seeing right now in terms of global trends around free expression that worry you? BI : I think, like everyone else, that what we're seeing in Western democracies—in the US and even in Europe—is concerning. But I'm still more concerned about authoritarian regimes than about our democracies. Maybe it's a case of not learning my lesson or of naive optimism, but I'm still more concerned about China and Russia than I am about what I see in France, the UK, or the US. There has been some recent reporting about China developing very advanced censorship and surveillance technologies and exporting them to other countries like Myanmar and Pakistan. What we’re seeing in Russia—I’m not an expert on that region, but we heard experts saying back in 2022 that Russia was trying to increase its censorship and control, but that it couldn’t become like China because China had exerted control over its internet from the very beginning: They removed Facebook back in 2009, then Google was pushed away by the authorities (and the market). And the Chinese authorities successfully filled the gaps left by the absence of those foreign Western companies. Some researchers working on Russia were saying that it wasn’t really possible for Russia to do what China had done because it was unprepared and that China had engineered it for more than a decade. What we are seeing now is that Russia is close to being able to close its Internet, to close the country, to replace services by its own controlled ones. It’s not identical, but it’s also kind of replicating what China has been doing. And that’s a very sad observation to make. Beyond the digital, the issue of how far Putin is willing to go in escalating concerns. As a human being and an inhabitant of the European continent, I’m concerned by the ability of a country like Russia to isolate itself while waging a war. Russia is engaged in a real war and at the same time is able to completely digitally close down the country. Between that and the example of China exporting censorship, I’m not far from thinking that in ten or twenty years we’ll have a completely splintered internet. JY : Do you feel like having a global perspective like this has changed or reshaped your views in any way? BI : Yes, in the sense that when you start working with international organizations, and you start hearing about the world and how human rights are universal values, and you get to meet people and go to different countries, you really get to experience how universal those freedoms and aspirations are. When I worked RSF and lobbied governments to pass a good law or abolish a repressive one, or when I worked on a case of a jailed journalist or blogger, I got to talk to authorities and to hear weird justifications from certain governments (not mentioning any names but Myanmar and Vietnam) like “those populations are different from the French” and I would receive pushback that the ideas of freedoms I was describing were not applicable to their societies. It’s a bit destabilizing when you hear that for the first time. But as you gain experience, you can clearly explain why human rights are universal and why different populations shouldn’t be ruled differently when it comes to human rights. Everyone wants to be free. This notion of “universality” is comforting because when you’re working for something universal, the argument is there. The freedoms you defend can’t be challenged in principle, because everyone wants them. If governments and authorities really listened to their people, they would hear them calling for those rights and freedoms. Or that’s what I used to think. Now we hear this growing rhetoric that we (people from the West) are exporting democracy, that it’s a western value, and not a universal one. This discourse, notably developed by Xi Jinping in China, “Western democracy” as a new concept— is a complete fallacy. Democracy was invented in the West, but democracy is universal. Unfortunately, I now believe that, in the future, we will have to justify and argue much more strongly for the universality of concepts like democracy, human rights and fundamental freedoms. JY : Thank you so much for this insight. And now for our final question: Do you have a free speech hero? BI : No. JY : No? No heroes? An inspiration maybe. BI : On the contrary, I’ve been disappointed so much by certain figures that were presented as human rights heroes…Like Aung San Suu Kyi during the Rohingya crisis, on which I worked when I was at RSF. Myanmar officially recognizes 135 ethnic groups, but somehow this one additional ethnic minority (the Rohingya ) is impossible for them to accept. It’s appalling. It’s weird to say, but some heroes are not really good people either. Being a hero is doing a heroic action, but people who do heroic actions can also do very bad things before or after, at a different level. They can be terrible persons, husbands or friends and be a “human rights” hero at the same time. Some people really inspired me but they’re not public figures. They are freedom fighters, but they are not “heroes”. They remain in the shadows. I know their struggles; I see their determination, their conviction, and how their personal lives align with their role as freedom fighters. These are the people who truly inspire me.
arXiv:2511.11040v1 Announce Type: new Abstract: Recent studies on LLM agent scaling have highlighted the potential of Multi-Agent Debate (MAD) to enhance reasoning abilities. However, the critical aspect of role allocation strategies remains underexplored. In this study, we demonstrate that allocating roles with differing viewpoints to specific positions significantly impacts MAD's performance in reasoning tasks. Specifically, we find a novel role allocation strategy, "Truth Last", which can improve MAD performance by up to 22% in reasoning tasks. To address the issue of unknown truth in practical applications, we propose the Multi-Agent Debate Consistency (MADC) strategy, which systematically simulates and optimizes its core mechanisms. MADC incorporates path consistency to assess agreement among independent roles, simulating the role with the highest consistency score as the truth. We validated MADC across a range of LLMs (9 models), including the DeepSeek-R1 Distilled Models, on challenging reasoning tasks. MADC consistently demonstrated advanced performance, effectively overcoming MAD's performance bottlenecks and providing a crucial pathway for further improvements in LLM agent scaling.
arXiv:2511.11029v1 Announce Type: new Abstract: In constraint programming and related paradigms, a modeller specifies their problem in a modelling language for a solver to search and return its solution(s). Using high-level modelling languages such as Essence, a modeller may express their problems in terms of abstract structures. These are structures not natively supported by the solvers, and so they have to be transformed into or represented as other structures before solving. For example, nested sets are abstract structures, and they can be represented as matrices in constraint solvers. Many problems contain symmetries and one very common and highly successful technique used in constraint programming is to "break" symmetries, to avoid searching for symmetric solutions. This can speed up the solving process by many orders of magnitude. Most of these symmetry-breaking techniques involve placing some kind of ordering for the variables of the problem, and picking a particular member under the symmetries, usually the smallest. Unfortunately, applying this technique to abstract variables produces a very large number of complex constraints that perform poorly in practice. In this paper, we demonstrate a new incomplete method of breaking the symmetries of abstract structures by better exploiting their representations. We apply the method in breaking the symmetries arising from indistinguishable objects, a commonly occurring type of symmetry, and show that our method is faster than the previous methods proposed in (Akg\"un et al. 2025).
arXiv:2511.11017v1 Announce Type: new Abstract: The rapid expansion of e-commerce platforms generates vast amounts of unstructured product data, creating significant challenges for information retrieval, recommendation systems, and data analytics. Knowledge Graphs (KGs) offer a structured, interpretable format to organize such data, yet constructing product-specific KGs remains a complex and manual process. This paper introduces a fully automated, AI agent-driven framework for constructing product knowledge graphs directly from unstructured product descriptions. Leveraging Large Language Models (LLMs), our method operates in three stages using dedicated agents: ontology creation and expansion, ontology refinement, and knowledge graph population. This agent-based approach ensures semantic coherence, scalability, and high-quality output without relying on predefined schemas or handcrafted extraction rules. We evaluate the system on a real-world dataset of air conditioner product descriptions, demonstrating strong performance in both ontology generation and KG population. The framework achieves over 97\% property coverage and minimal redundancy, validating its effectiveness and practical applicability. Our work highlights the potential of LLMs to automate structured knowledge extraction in retail, providing a scalable path toward intelligent product data integration and utilization.
arXiv:2511.10857v1 Announce Type: new Abstract: Conventional planning units or urban regions, such as census tracts, zip codes, or neighborhoods, often do not capture the specific demands of local communities and lack the flexibility to implement effective strategies for hazard prevention or response. To support the creation of dynamic planning units, we introduce a planning support system with agentic AI that enables users to generate demand-oriented regions for disaster planning, integrating the human-in-the-loop principle for transparency and adaptability. The platform is built on a representative initialized spatially constrained self-organizing map (RepSC-SOM), extending traditional SOM with adaptive geographic filtering and region-growing refinement, while AI agents can reason, plan, and act to guide the process by suggesting input features, guiding spatial constraints, and supporting interactive exploration. We demonstrate the capabilities of the platform through a case study on the flooding-related risk in Jacksonville, Florida, showing how it allows users to explore, generate, and evaluate regionalization interactively, combining computational rigor with user-driven decision making.
arXiv:2511.10842v1 Announce Type: new Abstract: Knowledge graphs have emerged as fundamental structures for representing complex relational data across scientific and enterprise domains. However, existing embedding methods face critical limitations when modeling diverse relationship types at scale: Euclidean models struggle with hierarchies, vector space models cannot capture asymmetry, and hyperbolic models fail on symmetric relations. We propose HyperComplEx, a hybrid embedding framework that adaptively combines hyperbolic, complex, and Euclidean spaces via learned attention mechanisms. A relation-specific space weighting strategy dynamically selects optimal geometries for each relation type, while a multi-space consistency loss ensures coherent predictions across spaces. We evaluate HyperComplEx on computer science research knowledge graphs ranging from 1K papers (~25K triples) to 10M papers (~45M triples), demonstrating consistent improvements over state-of-the-art baselines including TransE, RotatE, DistMult, ComplEx, SEPA, and UltraE. Additional tests on standard benchmarks confirm significantly higher results than all baselines. On the 10M-paper dataset, HyperComplEx achieves 0.612 MRR, a 4.8% relative gain over the best baseline, while maintaining efficient training, achieving 85 ms inference per triple. The model scales near-linearly with graph size through adaptive dimension allocation. We release our implementation and dataset family to facilitate reproducible research in scalable knowledge graph embeddings.
arXiv:2511.10776v1 Announce Type: new Abstract: Counterfactual decision-making in the face of uncertainty involves selecting the optimal action from several alternatives using causal reasoning. Decision-makers often rank expected potential outcomes (or their corresponding utility and desirability) to compare the preferences of candidate actions. In this paper, we study new counterfactual decision-making rules by introducing two new metrics: the probabilities of potential outcome ranking (PoR) and the probability of achieving the best potential outcome (PoB). PoR reveals the most probable ranking of potential outcomes for an individual, and PoB indicates the action most likely to yield the top-ranked outcome for an individual. We then establish identification theorems and derive bounds for these metrics, and present estimation methods. Finally, we perform numerical experiments to illustrate the finite-sample properties of the estimators and demonstrate their application to a real-world dataset.
arXiv:2511.10767v1 Announce Type: new Abstract: Structural measures of graphs, such as treewidth, are central tools in computational complexity resulting in efficient algorithms when exploiting the parameter. It is even known that modern SAT solvers work efficiently on instances of small treewidth. Since these solvers are widely applied, research interests in compact encodings into (Q)SAT for solving and to understand encoding limitations. Even more general is the graph parameter clique-width, which unlike treewidth can be small for dense graphs. Although algorithms are available for clique-width, little is known about encodings. We initiate the quest to understand encoding capabilities with clique-width by considering abstract argumentation, which is a robust framework for reasoning with conflicting arguments. It is based on directed graphs and asks for computationally challenging properties, making it a natural candidate to study computational properties. We design novel reductions from argumentation problems to (Q)SAT. Our reductions linearly preserve the clique-width, resulting in directed decomposition-guided (DDG) reductions. We establish novel results for all argumentation semantics, including counting. Notably, the overhead caused by our DDG reductions cannot be significantly improved under reasonable assumptions.
arXiv:2511.10705v1 Announce Type: new Abstract: Graphical User Interface (GUI) task automation constitutes a critical frontier in artificial intelligence research. While effective GUI agents synergistically integrate planning and grounding capabilities, current methodologies exhibit two fundamental limitations: (1) insufficient exploitation of cross-model synergies, and (2) over-reliance on synthetic data generation without sufficient utilization. To address these challenges, we propose Co-EPG, a self-iterative training framework for Co-Evolution of Planning and Grounding. Co-EPG establishes an iterative positive feedback loop: through this loop, the planning model explores superior strategies under grounding-based reward guidance via Group Relative Policy Optimization (GRPO), generating diverse data to optimize the grounding model. Concurrently, the optimized Grounding model provides more effective rewards for subsequent GRPO training of the planning model, fostering continuous improvement. Co-EPG thus enables iterative enhancement of agent capabilities through self-play optimization and training data distillation. On the Multimodal-Mind2Web and AndroidControl benchmarks, our framework outperforms existing state-of-the-art methods after just three iterations without requiring external data. The agent consistently improves with each iteration, demonstrating robust self-enhancement capabilities. This work establishes a novel training paradigm for GUI agents, shifting from isolated optimization to an integrated, self-driven co-evolution approach.