Category: Ideas

  • Understanding Reliability at TopHat

    Understanding Reliability at TopHat

    that indecipherable prompt on the screen above is a teacher failing to connect

    The project which got me the job at TopHat concerned perception of reliability by teachers and students using the platform. Churn rates were very high in classrooms. No obvious patterns showed in metrics, telemetry, course construction, type of course, etc… Some in the company dismissed reliability concerns as subjective or perceptual, manageable without fixing.

    Topwho?

    TopHat provides a gamified experience to improve student learning in the lecture hall. Students answer timed questions on their mobile or laptop when posed by the professor. This works when timing is perfect. The team heard student and teacher complaints about reliability but nothing correlated to consistently collected in-app data. Additionally students may be unreliable reporters when grades were at stake. They might skip class then blame the software.

    In class points counted for grades or attendance.

    The question: Is reliability mostly perceptual, or is there a cause we can fix in product or course design?

    Mixed methods combining fieldwork, in-app analytics, and semi-structured interviews. An Ideal outcome answers the big question and provides guidance to teams:

    • A way to predict likely reliability so we can deploy support when needed
    • A way to assess the severity or likelihood of churn with a focus on larger, more lucrative classrooms.
    • Determine if this was perception or technical.

    Protocol –

    Start with 10 ‘bad’ and 10 ‘good’ classrooms . Good classrooms rarely report issues. In bad classrooms, professors threaten to churn unless something changes. This first set of 20 helps form hypothesis to measure by metric and confirm with further onsite evaluation.

    Include a diversity of classes, and make detailed observations:

    • Subject, material,
    • Class size,
    • Nature of reliability claims,
    • Student response speed in app,
    • Position of students in lecture hall,
    • observation of distracted students,
    • speed tests,
    • utilized capacity of lecture space.
    • Identify a metric to help proactively identify low reliability.

    Reliability was technical, challenging to measure directly in-app

    Reliability correlated to Wi-Fi saturation and configuration. Poor configuration was made worse by the inefficient way slides were sent to student devices: Like most startups, Top hat engineered for fast feature development, not optimizing efficiency or speed over limited bandwidth. Large lecture halls sport many WiFi antennas to handle the load of connections. Configuration issues on the part of campus IT saturated some antennas and left others unused. Asking a question or moving forward in a slide delivered a hefty, synchronous data payload to each student laptop, overwhelming the network and causing delays. This meant missed questions, particularly on lectures with large numbers or image-intensive slides which could saturate the network.

    Reliability of attendance presented a different technical challenge: students could participate over browser or phone app, but laptops lack the precision of GPS receivers – we measured them at 92% accurate, where phones did better than 98%. More than 2 mis-marked students in a 100 student lecture leads to churn within 6 lectures.

    Students were generally attentive and honest. We sat in the back and observed screens in these lectures. At the time we observed a ‘distracted’ rate of around 10% across the board. Students were very good at task switching to answer a question when it came up. We saw very few examples of student dishonesty. We saw more examples where a question wouldn’t activate until only a few seconds were left to respond.

    Finding actionable leading indicators

    In student and professor interviews, and direct classroom observation the common complaint related to latency- questions didn’t appear on a student device until after the window for answering questions closed. We confirmed this with direct observation in lecture halls.

    Direct measurement proved difficult. When we talk about measuring click streams, its within the same person, same device, same app or tab. In this case the team needed to measure the time elapsed from when a professor clicked to pose a timed question to when the question activated on each students device. Many factors complicate this direct measurement.

    Poor speed test results generally correlated to problem classes. These were measured manually as opposed to automatically in-app.

    There was no correlation to class size, number of questions, lecture hall utilization or student NPS scores or comments.

    We saw medium correlation to the number of lecture slides.

    There was a very strong association to post-lecture grade book adjustments and churn. If a reliability issue meant more work for a professor, it spelled trouble. When it was just complaints, it mattered much less.

    Impact: Reliability was not subjective or a result of student dishonesty.

    We could identify courses of concern in time to intervene: they had more than 4 grade book correction per 100 students within the first two lectures, occurred in a lecture hall with known bad WiFi, or existed where majority of students used the web app for attendance.

    Reliability became a business concern when professors took on extra work to correct grade books. With an average of 12 lectures per course, professors would churn due to the extra work.

    Actionable improvements: The team moved to reduce the size of slides to students. This provided a modest benefit.

    When we identified a possible ‘poor classroom’ we could work with campus IT to resolve.

    We introduced a regular practice of bringing engineers and product managers to class. This paid dividends later on in terms of engineering velocity and empathy.

    As a result of this, TopHat established a full-time research team which regularly and consistently visited classrooms to measure everything from how projector resolution and ambient light affected questions to how frequency and randomness in asking questions affected student attention.

  • Wanna make the journey stick? Use the finest learning instrument.

    Wanna make the journey stick? Use the finest learning instrument.

    Activity trumps Lecture, every time.

    Cross functional teams often struggle to have a shared sense of a problem space. Designers and researchers use frameworks like user journeys and personas to solve for this. Teams praise the praise the presentation then promptly forget it. Complicating this, design /research teams constantly tinker and refine. To outsiders it just makes things worse.

    So… Enter the board game

    The game was cooked up in a frantic 5 days.

    Specifically this game is for a company who sells to users and buyers involved in the first stages of pre-clinical drug discovery. The game represents the content of journeys and personas in a way that lets a team feel and remember thru experience, rather than focus on ‘false details.’ Games can backtrack and circle where journeys tend to make everything a line.

    How to Play: For the game, 2-4 teams of 3-4 players compete to prove the viability of a target for creating a new medicine (the underlying journey). Each team represents a pharma. Players choose different roles with different special powers(Personas). The game completes when one team reaches a final square after successfully solving a “invitro” and an “invivo” experimental puzzle. Game-play combines elements of Chutes and Ladders type games with “Clue.” We typically play for about 45 minutes, followed by discussion.

    The game arose out of a meta-analysis of existing research, supplementary interviews, and document analysis for the purpose of evaluating the likely market perception and pitfalls of a major new offering. I.e. is the offering a hit or miss, and what might they do to move it closer to hit. I also did the full workup of more traditional personas and journeys.

    Did it work?

    Users in and out of design retained knowledge about stages and pain points better.

    It also helped researchers open up to try different ways to communicate – knowing we have to try and fail if we are to speak with an effective voice.

    The cross functional teams of engineers, product managers, staff scientists, and design worked better together.

    Engineers better understood the value of some of the big engineering tickets. They also participated more in the life of the product.

    Three years later, people still approached me with memories of when they played the game.

    — Jeremy