The Center for Human-AI Innovation in Society

Consensus Beats Google Scholar in Education Leadership Pilot Study

An Interview with Dr. Seth Hunter
Associate Professor of Education Leadership at George Mason University
The article first appeared on https://consensus.app/ on September 4th, 2025

Federal law requires schools to use evidence-based interventions, but only 8 are officially approved for thousands of challenges. A pilot study run by Dr. Seth Hunter found that aspiring principals using Consensus were statistically significantly more likely to find research useful and easy to use compared to traditional methods like Google Scholar; suggesting AI could finally bridge the gap between academic research and school practice.

The Evidence Gap

Every year, thousands of education studies are published with the potential to improve schools. Yet principals and policymakers rarely use this evidence. Without tools or training to find and interpret research, they rely on generic interventions that check compliance boxes but don’t help students. Progress stalls, and children miss out on solutions that could change their lives.

The Scale of the Gap is Staggering

Since 2002, when the No Child Left Behind Act took effect, federal law has required underperforming schools to use “evidence-based” interventions backed by rigorous, peer-reviewed research. Yet in practice, only eight interventions have cleared federal review and are readily accessible to school leaders, leaving them with far too few options for the thousands of challenges they face. With so few choices, principals often pick from the list simply to stay in compliance, not because the options match their schools’ needs.

Take declining student attendance. Schools send automated text reminders to parents when the real issue might be bullying that requires counseling. Math score declines trigger sweeping reforms when the actual problem is vocabulary gaps affecting specific student groups. The intervention fails, but compliance boxes get checked.

 

 

Testing Consensus as a Bridge

Hunter wondered if AI could break this cycle.

To find out, he designed a pilot study with 40 aspiring principals in Mason’s Education Leadership MEd program, randomly assigning half to use an AI-enabled workflow built around Consensus, while the control group relied on traditional methods like Google Scholar and federal repositories. Hunter chose Consensus because it addressed two critical problems: its filters help surface rigorous studies that meet federal requirements, and its clear summaries make intimidating academic articles accessible to busy school leaders.

 

Product shot of results on consensus.app

Transformative Results

The results were striking. Students using the Consensus-based workflow said it was significantly easier to find and understand rigorous research. On surveys, they were statistically significantly more likely to agree the tools would be useful in their future roles as education leaders and that the tools were easy to use (effect sizes +0.16 and +0.17 SD, respectively). Even more importantly, for the first time since 2018, Hunter’s pre-service principals told him they could see themselves applying this caliber of research in schools. For him, that was a breakthrough.

 

Quote card featuring Dr. Seth Hunter

These findings suggest that AI-enabled tools like Consensus could meaningfully shrink the gap between technical academic research and K-12 research use. Hunter envisions the potential extending far beyond compliance: math coaches could have rigorous research “in their back pocket” to help teachers improve vocabulary instruction for specific student populations, or principals could quickly find evidence-based strategies for emerging challenges.

Scaling to Schools

Hunter is careful to note the study’s limitations. It was a classroom pilot based on self-reports, not causal evidence of impact on actual plan quality or student outcomes. Still, the early signals are promising. His team is now partnering with the Virginia Department of Education to test the workflow with practicing principals and district leaders. They plan to use large language models to systematically analyze school improvement plans statewide. The goal is to compare the quality and relevance of interventions chosen by districts using the AI workflow with those chosen by districts relying on traditional methods. This will provide the first rigorous assessment of whether AI tools can actually improve educational decision-making at scale.

Hunter’s work represents only the beginning of what is possible when AI helps translate rigorous research into practical solutions. As more educators gain access to these tools, evidence-based decision making may finally become the norm rather than the exception in schools. The question is not whether AI can help, but how quickly we can scale these approaches to reach the students who need them most.

If you are using Consensus in education or in another field where research translation matters, we would like to hear from you. Contact us at [email protected] to share how AI-powered research tools are changing the way you work.

 

Disclaimer: This article describes a classroom pilot led by Dr. Seth Hunter. The views expressed are his own and do not represent George Mason University.

CHAIS Newsletter September 16, 2025

1. Research-to-Practice Seminar
Research Collaborations with Fairfax Fire Department
📅 Tuesday, September 30 | 10:00 a.m.–12:00 p.m.
📍 Hybrid (In-person: Fuse Room 6333 | Online: Zoom link: https://gmu.zoom.us/j/92424464450) 

CHAIS’ Craig Yu and Myeong Lee will lead a discussion showcasing their collaboration with the Fairfax Fire Department. They will share key research breakthroughs, lessons learned from the partnership, and perspectives on working with local government. Representatives from Fairfax Fire Department will also reflect on the partnership—its challenges, impact, and vision for the future. 

If you are interested in developing partnerships with local governments, engaging in research-to-practice collaborations, or becoming part of the ongoing partnership with Fairfax Fire Department, we encourage you to join this event. While we provide participating information above, please sign up and let us, the organizers, know if you plan to attend here – https://forms.office.com/r/wWXHv0cpRh 

Check update and agenda of this seminar here – https://chais.gmu.edu/event/research-to-practice-seminar/ 

2. Leadership & Engagement Opportunities at CHAIS 

Special thanks to Alan Shark, who has agreed to serve as CHAIS’ first Outreach Committee Chair. This committee will design and lead outreach efforts to share the research of CHAIS faculty and build industry and government relationships, including through in-person engagements, webinars, websites, and social media. 

We are seeking two additional members to join the Outreach Committee. If you enjoy outreach activities and have strong internal or external networks, we encourage you to join Alan in this important effort. 

We are also seeking members to take on leadership roles in other CHAIS initiatives—such as a Board of Advisors, peer-mentoring programs, a proposal development working group, and more. If you are interested in contributing to these initiatives, or launching a new one, we look forward to hearing from you. 

3. Upcoming Event: CyberAI Summit 2025: AI, Careers, and IBM 

📅 Monday, September 22, 2025 | 10:00 a.m.–3:00 p.m. (lunch provided)
📍 1201 Merten Hall, 4441 George Mason Blvd, Fairfax, VA 22030
(Online participation available) 

 Registration is required. Learn more and register here –https://crc.gmu.edu/event/gmu-cyberai-summit-ai-careers-and-ibm/ 

This one-day summit will explore how AI, cybersecurity, and enterprise infrastructure are shaping the future of technology. Participants will gain hands-on experience with IBM platforms, including LinuxONE – a free Linux virtual machine for coursework, projects, and research, available for faculty and students. In-person attendees can earn an exclusive badge.  

Please share with your colleagues and interested students. Hope to see you there.  

 4. Women Executives in Tech Circle 

Peng is coordinating the Women Executives in Tech Circle sponsored by Mason’s Cyber Resilience Center (CRC) which brings together a cohort of women leaders in technology for peer mentoring, shared learning, and discussions with experts in tech and leadership. 

If you or someone you know may be interested, please find more information and join here: – https://crc.gmu.edu/women-executives-in-tech-circle/ 

 5. Faculty Research Spotlight 

 Thema Monroe-White 

Thema’s co-authored work entitled: “Social Networks and Entrepreneurial Outcomes of Women Founders: An Intersectional Perspective” received the Best Paper Award this past July at the 2025 Diana International Research Institute Conference in Auckland, New Zealand. 

Thema presented her co-authored research entitled: “Echoes of Eugenics: Tracing the Ideological Persistence of Scientific Racism in Scholarly Discourse” at the 29th Annual International Conference on Science and Technology Indicators Conference(STI-ENID) this September in Bristol, UK. This project utilizes machine leaning, natural language processing (NLP) techniques to trace ideological bias in scholarly publications over time. 

New publication: Shieh, E., & Monroe-White, T. (2025, August). Teaching Parrots to See Red: Self-Audits of Generative Language Models Overlook Sociotechnical Harms. In Proceedings of the 2025 AAAI Summer Symposium Series (Vol. 6, No. 1, pp. 333-340). https://ojs.aaai.org/index.php/AAAI-SS/article/view/36070  

 Craig Yu 

Craig in the news: 

Craig’s recent award: Mason 2025 Innovators Award (Digital Innovation) 

Craig’s recent publications:  

  • Charles Ahn, Ashaki SetepenRa-Deloatch, Ubada Ramadan, Quang Vo, Jacob Matthew Wojtecki, Nathan Alan Moy, Ching-I Huang, Bo Han, Songqing Chen, Carley Fisher-Maltese, Lap-Fai Yu, Mohamed Alburaki“Teleoperated 360 Video Capture of Beehives for Scientific Visualization in VR”, Research Demo, ACM Symposium on Virtual Reality Software and Technology (VRST), 2025  
  • William Ranc, Thanh Nguyen, Liuchuan Yu, Yongqi Zhang, Minyoung Kim, Haikun Huang, Lap-Fai Yu “Multi-Player VR Marble Run Game for Physics Co-Learning”, Research Demo, IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2025 
  • Changyang Li, Qingan Yan, Minyoung Kim, Zhan Li, Yi Xu, Lap-Fai Yu“Crafting Dynamic Virtual Activities with Advanced Multimodal Models”, IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2025 

To be featured in future faculty research spotlight, please submit your achievements (grants, publication, awards, recognition) you received during the last three months here – https://forms.office.com/r/jh04Hi3b58 

6. Student Research Spotlight 

 We are adding a new feature to the CHAIS website to showcase current research topics and projects by doctoral students. Please encourage your students to submit a 300–500 word research description for consideration. Selected submissions will be published on CHAIS.gmu.edu along with a short bio and headshot. Projects that are highly relevant to CHAIS, demonstrate interdisciplinary thinking, strong research design, and potential for broad impact will be given priority. 

Please invite qualified students to submit here – https://forms.office.com/r/wVQ0K0u7D3 

7. Funding Opportunities 

NSF 

  1. AI Featured funding overview – https://www.nsf.gov/focus-areas/artificial-intelligence#featured-funding-13c

2.Seed fund on AI – https://seedfund.nsf.gov/topics/artificial-intelligence/ 

NIH 

AI featured funding overview – https://datascience.nih.gov/artificial-intelligence 

Bridge2AI – https://commonfund.nih.gov/bridge2ai 

NEH 

AI featured funding overview – https://www.neh.gov/AI 

Humanities Research Centers on AI – https://www.neh.gov/program/humanities-research-centers-artificial-intelligence 

Department of Education 

AI featured funding guidance – https://www.ed.gov/about/news/press-release/us-department-of-education-issues-guidance-artificial-intelligence-use-schools-proposes-additional-supplemental-priority 

SBIR (eligibility – small business) – https://ies.ed.gov/funding/research/programs/small-business-innovation-research-sbir/solicitation-information 

DoD 

AI Next Campain – https://www.darpa.mil/research/programs/ai-next-campaign 

DAF AI Accelerator Fellowship – https://www.aiaccelerator.af.mil/Phantom-Program/ 

Run by the U.S. Air Force and MIT, this fellowship program places selected “Phantoms” into AI research teams to: 

Work on real-world DoD AI projects. 

Receive advanced AI training. 

Influence acquisition and policy for ethical AI deployment. 

It’s a five-month immersive experience for military and civilian personnel focused on AI innovation and implementation. 

DAF AI Launch Point –https://www.dafcio.af.mil/AI/  

This is the central AI innovation hub for the Department of the Air Force. It supports: 

AI strategy and policy development. 

Cross-agency collaboration on AI R&D. 

Launching new AI pilot programs and partnerships. 

DoE 

Advanced Scientific Computing Research, Basic Energy Sciences, Biological and Environmental Research, Fusion Energy Sciences, High Energy Physics, Nuclear Physics, Isotope R&D and Production, and Accelerator R&D and Production – https://science.osti.gov/grants/FOAs/-/media/grants/pdf/foas/2024/DE-FOA-0003432.pdf 

Private Sector / Philanthropy: 

Google: https://research.google/ > https://research.google/programs-and-events/google-academic-research-awards/ 

 Please let us know other opportunities to include in the next CHAIS newsletter.  

 8. CHAIS Listserv 

If you do not want to be on this listserv, please let us know. Please also let us know if you want to invite someone on the listserv. 

 

AI auditing AI: Towards digital accountability

 

By Alan Shark

This article was originally published by Route Fifty Republished here with the author’s permission.

 

Artificial intelligence systems are now making decisions in policing, hiring, healthcare, cybersecurity, purchasing and finance — but errors or biases can have significant consequences.

Humans alone can’t keep up: models are too complex, too fast, too large in scope. And yet, nearly every AI policy states humans must provide oversight and control. Keeping up with advancements in AI applications is almost impossible for humans. Worse, some admit to over-reliance on AI applications. This is where the idea of AI systems designed to check other AI systems comes in.

Traditionally, humans have performed this role. Auditors, compliance officers, regulators and watchdog organizations have long worked to ensure systems operate as intended. But when it comes to AI, humans alone may no longer be enough. The models are too complex, too fast, and too embedded in decision-making pipelines for manual oversight to keep pace.

That’s why researchers and practitioners are turning to an intriguing solution: using AI itself to audit AI. Recognizing the impact of AI on government applications, in 2021, the Government Accountability Office developed an ahead-of-its-time report, “Artificial Intelligence — An Accountability Framework for Federal Agencies and Other Entities.” Although the framework was practical and far-reaching, it still relied on human planning and oversight.

Today, we are entering a new area of AI accountability with talk about the advent of “watchdog AIs” or “AI auditors” that test, verify and monitor other AI models. This is increasingly important as AI grows more complex and less transparent to human reviewers.

Making the case for AI auditing, we can safely assume that AI can rapidly analyze outputs across millions of data points. And unlike human auditors, AI doesn’t get tired or overlook details. Auditing can occur in real-time, and flag problems as they arise.  AI auditors can probe “black box” models with tests humans couldn’t do manually. Taken together, AI auditing strengths can be summarized by its ability to scale, provide consistency, speed, transparency, and accuracy.

Auditing AI is not a single technology but a suite of methods. Some of the most promising approaches include:

  • Adversarial testing: One AI generates tricky edge cases designed to trip up another AI, exposing blind spots.
  • Bias and fairness detection: Auditing systems measure outcomes across demographic groups to reveal disparities.
  • Explainability tools: Specialized models analyze which factors most influenced a decision, helping humans understand why a model reached its conclusion.
  • Continuous monitoring: AI can watch for “model drift” — when performance degrades over time as data or circumstances change — and signal when retraining is needed.

In many ways, this mirrors how cybersecurity works today, where red teams and intrusion-detection systems constantly test defenses. Here, the target is not a firewall but another algorithm.

Real-world applications are emerging, though still in its early stages, AI auditing is moving beyond theory. Here are several examples:

  • Finance: Some firms are already deploying AI to double-check fraud-detection models, ensuring that suspicious activity flags are consistent and not biased.
  • Healthcare: AI-driven validation tools are being used to test diagnostic algorithms, checking their accuracy against known patient outcomes.
  • Cybersecurity: “Red team” AIs are being trained to attack models the way hackers might, helping developers harden systems before release.
  • Public sector pilots: Governments are beginning to experiment with algorithmic auditing programs, often in regulatory “sandboxes” where new models are tested under close supervision

These examples suggest a growing recognition that human oversight must be paired with automated oversight if AI is to be trusted at scale. At the same time, we must acknowledge AI auditing risks and limitations raise their own set of challenges. This includes the following:

  • The infinite regress problem: If one AI audits another, who audits the auditor? At some point, humans must remain in the loop. Or perhaps there might be a third level of AI checking on AI, checking on AI.
  • Shared blind spots: If both models are trained on similar data, they may replicate the same biases rather than uncover them.
  • Over-trust: Policymakers and managers may be tempted to rely too heavily on “AI-certified AI” without questioning the underlying process.
  • Resource costs: Running parallel AI systems can be expensive in terms of computing power and energy consumption.

In short, as tempting as it may appear, AI auditors are not a panacea. They are tools—powerful ones, but only as good as their design and implementation.

This raises critical governance questions. Who sets the standards for AI auditors? Governments, industry consortia, or independent third parties? Should auditing AIs be open-source, to build public trust, or proprietary, to protect against exploitation? And how do we ensure accountability when the auditors themselves may be opaque? Can or should AI auditing be certified, and if so, by whom?

There are strong arguments for third-party, independent auditing — similar to how financial auditing works today. Just as markets rely on trusted external auditors, the AI ecosystem will need its own class of independent algorithmic auditors. Without them, self-auditing could resemble letting the fox guard the henhouse.

Most experts envision a layered approach: humans define auditing standards and interpret results, while AI handles the heavy lifting of large-scale checking. This would create multiple levels of defense — primary AI, auditing AI and human oversight.

The likely result will be a new industry built around AI assurance, certification, and compliance. Just as accounting gave rise to auditing firms, AI may soon give rise to an “AI auditing sector” tasked with keeping digital systems honest. And beyond the technical details lies something more important: public trust. The willingness of people to accept AI in critical domains may depend on whether robust and credible audit mechanisms exist.

AI auditing AI may sound strange at first, like machines policing themselves. But far from being a case of “the fox guarding the henhouse,” it may prove essential to making AI safe, reliable and trustworthy. The truth is, humans cannot realistically keep up with the scale and complexity of today’s AI. We need allies in oversight — and in many cases, the best ally may be another AI. Still, human judgment must remain the final arbiter.

Just as financial systems depend on auditors to ensure trust, the AI ecosystem will need its own auditors—both human and machine. The future of responsible AI may well depend on how well we design these meta-systems to keep each other in check.

 

 

Dr. Alan R. Shark is a senior fellow at the Center for Digital Government and an associate professor at the Schar School for Policy and Government, George Mason University, where he also serves as a faculty member at the Center for Human AI Innovation in Society (CHAIS). Shark is also a senior fellow and former Executive Director of the Public Technology Institute (PTI). He is a Fellow of the National Academy of Public Administration and Founder and Co-Chair of the Standing Panel on Technology Leadership. Shark is the host of the podcast Sharkbytes.net.

CEHD and University Libraries are building a tool to improve navigating education research

 

Faculty across the College of Education and Human Development, working with University Libraries, are exploring AI-powered ways to bridge that gap—exemplifying the Initiative’s dual focus on advancing 21st-century education for all and driving responsible digital innovation in service of the public good.

Seth Hunter

Seth Hunter, associate professor of education leadership and policy and senior fellow of EdPolicyForward: George Mason’s Center for Education Policy, collaborated with the University Libraries to receive seed funding from EBSCO—a library database service—to develop a tool that utilizes artificial intelligence (AI) to help practitioners in education find and interpret evidence-based practices and research that can be implemented to improve student outcomes.

Before coming to George Mason, Hunter worked in state and local education agencies, where “research in educational studies in K-12 was available, but not being used to support important decisions at district and state levels. The disconnect between research and practice and policymaking was fascinating to me.”

This wasn’t exclusive to education agencies; the same patterns appeared in local school districts as well. Beth Davis, PhD Education ’23, and postdoctoral fellow in EdPolicyForward, noted that she saw this gap in action.

“When I worked at a high school in Maryland, there were so many challenges that I knew there were better ways to do things than what we were doing, but I also didn’t feel like we had the time to figure out what those things might be,” she said.

Beth Davis

As Davis pointed out, an efficient and effective search takes time, even for the most seasoned researcher. Combing through search results, sorting books from articles from reviews, tracking new keywords and phrases: It can be a challenging, and time-consuming, effort just to find the articles you need. And that doesn’t account for the time required to read and interpret the content once found.

As AI grew in its capabilities, Hunter saw an opportunity to leverage this new technology to perhaps bridge that research/practice gap.

“We’re looking at bespoke AI tools that are out there already to help practitioners find and translate research in ways that would enable school improvement for student benefit while meeting federal and state policy guidelines,” explained Hunter.

Early results show promise. In a pilot endeavor within the Education Leadership MEd program, the students asked to search for articles using AI-powered tools found the process much easier than traditional search methods, and they reported that the AI helped them understand the research through its ability to succinctly summarize the contents in more accessible language.

“It could streamline the process of making evidence-based practices and research easier to find and therefore implement,” said Davis.

Christopher Lowder

As the primary provider for access to scholarly research, University Libraries is central to this project. Hunter recruited the expertise of Christopher Lowder, education subject librarian, to assist them in developing the tool.

“Having computers go through information quickly is not new,” said Lowder, “but now we’re thinking about how this new generation of AI can make the research content understandable to the average user. It’s almost like translation, or decoding.” For example: Can the AI alert the user if the study meets the standards set forth by federal and state Departments of Education?

For Hunter, bringing the public back to university libraries is a secondary benefit of the project. “I think university library systems are such a huge value-add for the public good,” said Hunter. “Libraries set universities apart from many other knowledge organizations.”

Hunter also notes that a tool like this supports fiscal responsibility. With evidence-based practices, leadership can be confident they are putting resources behind an initiative that has been empirically tested and verified. “We want to enable better decision-making so leaders can be better stewards of public resources that can improve students’ lives.”

As one of the top 20 universities for innovation in the United States and a leader in AI innovation in Virginia, George Mason is working to advance research for the campus community and beyond. Multidisciplinary teams like this are, for Hunter, a core tenet of this, and necessary to the development of bold solutions to the grand challenges of our generation.

“We have these permeable boundaries within units and across colleges,” said Hunter. “I mean, this team illustrates that: We have CEHD faculty, we have a librarian, we’ll be working with computer scientists…we are encouraged to engage in transdisciplinary research and that feeds innovation.”

Lowder agreed. “George Mason encourages us to try things in new and different ways and to include different voices within the conversation, and our leadership genuinely believes that our work can do good not just for the university but also for the wider community. I believe it’s a public good for information to be findable, understandable, and accessible.”
Originally posted on August 28, 2025Sarah Holland on https://www.gmu.edu/news/2025-08/cehd-and-university-libraries-are-building-tool-improve-navigating-education-research.