Journal of Housing & Community Development
Featured Story

Curious How Public Housing Agencies are Scoring Under NSPIRE?  Four Initial Takeaways from the First Release of NSPIRE Data for Public Housing

December 19, 2024
by Andrew Van Horn and Richa Goel

Executive Summary 

NAHRO has made four findings after conducting an analysis of the public housing portfolio’s first round of NSPIRE data. 

  1. The public housing portfolio has seen a decline in adequate funding, but NSPIRE scores are generally an improvement over those same properties’ most recent UPCS inspections.  
  2. HUD really will fail agencies with too many significant deficiencies in living units.  
  3. For agencies with the Small Rural designation, the first NSPIRE scores have a different distribution with relatively fewer scores falling in the middle. 
  4. The data HUD has provided show inspections being conducted at 48 NSPIRE inspections per month.

Each finding is discussed below. 

Background

Since 2019, HUD has worked to implement a new physical inspection protocol for all of its housing programs. Previously, public housing was assessed via the Uniform Physical Conditions (UPCS) protocol. In July 2023, all of the notices governing the new protocol, the National Standards for the Physical Inspection of Real Estate (NSPIRE), were published ending UPCS inspections.1  

Agencies did not know when exactly NSPIRE inspections would resume, how many developments would be inspected immediately, or how their scores would vary. And public housing physical inspection scores do have tangible consequences for Public Housing Agencies (PHAs)—they directly impact the overall performance rating of the agency and can impact funding and compliance requirements.  

This report takes a first look at the patterns new scores take.  

One final background note: NSPIRE replaced UPCS for other HUD programs too, including Project-Based Rental Assistance (PBRA). Because those data were just released, NAHRO will issue a similar report for PBRA physical inspection scores as well. 

Data and Methodology

This report relies on HUD data that was publicly available at the time of its publication. Many of these datasets have been replaced with newer versions and can no longer be found online.2 NAHRO thanks the Public and Affordable Housing Research Corporation (PAHRC) for helping NAHRO identify missing data and providing additional datasets no longer published online.

This report uses the first release of NSPIRE data. Only 635 NSPIRE inspections have been published, and just 631 public housing developments have received both NSPIRE and UPCS inspections. Compared to the quantity of UPCS inspections, the number of NSPIRE inspections is too low to draw final conclusions about how the protocol is working. Instead, this analysis is meant to provide agencies with a glimpse of how scores are changing based on the first inspections completed.

For more information on the data and methodology used to prepare this report, see the “Data and Methodology Appendix.”

The Findings

Finding #1: Agencies with both UPCS and NSPIRE scores saw better numbers under NSPIRE

It is actually not a simple question to consider whether and how scores changed under NSPIRE because doing so compares two sets of data: UPCS scores and NSPIRE scores. But there are three ways to compare these protocols: all UPCS scores from 2005 to 2023 vs. NSPIRE scores, the final UPCS score for each property vs. NSPIRE scores, or the final UPCS score for projects that have also had an NSPIRE inspection vs. the corresponding NSPIRE score for that same sample. The method that most resembles an experiment where you consider the outcomes before and after changing an independent variable is the final option, the final UPCS score for each project vs. the NSPIRE score for that same project. The first two variables use more data resulting in huge disparities in the size of the two samples being considered (roughly 41,000 and 8,000 UPCS scores respectively vs. just 635 NSPIRE scores) and consider the portfolio across several presidential administrations.

Using the first two scenarios—all UPCS score data available or the final UPCS score for every property—results in a negative change when comparing NSPIRE. See the Finding #1 Appendix for details on these trends. However, comparing the final UPCS score for projects that have had NSPIRE inspections and their corresponding NSPIRE score results in a positive shift overall as shown in Figure 1.

Figure 1: Each dot represents the change in score for a public housing development that has had both a UPCS and an NSPIRE inspection

Overall, more agencies improved their scores than did not from their last UPCS inspection to their first NSPIRE inspection. A difference of means test found that the on average, NSPIRE scores were 13 points higher than the final UPCS score for the 631 public housing developments that have had both inspection types. This difference is statistically significant. See the Finding #1 Appendix for this test’s statistical output. On the one hand, this result should not be surprising. Developments that have had both inspection types had an average final UPCS score of 60.5—HUD clearly tried to get out to those with the worst scores or with the oldest scores first. But by that logic, it would be reasonable to expect NSPIRE scores to be lower than final UPCS scores. After all, they are the same developments, and those developments are now older than when their final UPCS scores were done. The fact that NSPIRE scores are markedly and significantly higher may mean that they really are paying attention to different features and weighting the deficiencies they find differently.

NSPIRE and UPCS scores are also positively correlated, as shown in Figure 2.

Figure 2: A visual representation of the relationship between UPCS and NSPIRE scores

The linear line of best fit means that on average, a one-point increase in final UPCS score is associated with a .448 point increase in NSPIRE score, not quite a 1:1 ratio. This rate is statistically significant. Additionally, the associated “r-squared” coefficient means that just considering the two scores and no other variables explains 15% of the variation found in this model. There’s clearly more to a score than just which protocol is used, but that is an important factor. See the Finding #1 Appendix for the full statistical output.

NAHRO discusses the line of scores equaling exactly 59 points under NSPIRE next.

Finding #2: The data suggest that HUD is failing units due to excessive per-unit deficiencies

Many projects scored exactly 59 points under NSPIRE—some had lower UPCS scores, some higher. In fact, projects were 11 percentage points more likely to score exactly 59 points under NSPIRE than UPCS. Under UPCS, projects had a .78% probability of scoring exactly 59%. Both of these percentages are statistically significant. See the Finding #2 Appendix for the full output.

HUD did not provide a code book with their data saying that this is due to per-unit deficiencies. However, in the scoring notice, they say: “In the NSPIRE final rule and proposed Scoring notice, HUD identified three inspectable areas: Unit, Inside, and Outside. For scoring, HUD proposed that properties be rated against two performance thresholds: (1) Properties need to score 60 or above in all inspectable areas (“Property Threshold of Performance”), and (2) a “Unit Threshold of Performance”; where a loss of 30 points or more in the Unit portion of the inspection will result in a score adjustment to 59 or failing, even if the Inside and Outside portions of the inspection allowed it to score over 60 [….] Additionally, HUD will only lower the score to 59 if it was previously 60 or above. HUD will not further adjust scores that were already below 60.”3 The results of this second way to fail are striking.

In fact, the trend is not only visible in Figure 2 above but when considering all physical inspections conducted since 2020 as shown in Figure 3 below.

Figure 3: All inspections conducted since the restart of physical inspections in 2021 plotted by date

It seems extremely implausible that developments are scoring 59 at such a rate by coincidence. HUD, therefore, is finding agencies failing their “unit threshold of performance” and decreasing scores to 59. The health and safety of residents is the most important reason for a physical inspection, and HUD is weighting life threatening and severe deficiencies in units the highest. Agencies should focus their efforts there.

Finding #3: For agencies with the Small Rural designation, the first NSPIRE scores have a different distribution with relatively fewer scores falling in the middle

HUD attempts to provide regulatory relief to agencies with the Small Rural designation in several ways, one of which is to use their physical inspection scores as their PHAS scores under certain conditions, making the inspection of developments operated by Small Rural agencies extremely important.4

Only 75 Small Rural agency public housing developments have been inspected under NSPIRE. The average final UPCS score for Small Rural public housing developments was 85.1 and 76.5 under NSPIRE. A side-by side comparison of UPCS and NSPIRE scores is available in Figure 4. When comparing the distribution of scores by category, 90-100 is still the most common score category for small rural agencies; however, below 60 is now the second-most common whereas below 60 was the rarest under UPCS. These changes are likely due to two factors:

  1. Only 75 developments operated by small rural agencies have received an NSPIRE, so it is just too soon to tell if this trend will hold—NAHRO suspects the distribution will change as the sample grows, and
  2. Small Rural agency scores generally mirror trends seen across all public housing developments, and the same is true for the unit standard of performance resulting in adjusted scores of 59. See the Finding 3 Appendix for more.

It is too soon to draw serious conclusions about how agencies will fare under NSPIRE and even more so for Small Rural agencies due to the small sample included here. NAHRO will continue to monitor this trend and advocate for small agencies.

Figure 4: The number of Small Rural inspections falling into each score band for final UPCS score and NSPIRE score

Finding #4: The data HUD has provided shows 48 NSPIRE inspections per month

These data provide one of the first looks into the number of NSPIRE inspections HUD has performed and the speed at which they have ramped up inspections. HUD has verbally told NAHRO that roughly 2,500 public housing developments have received NSPIRE inspections, but these data do not match that number. According to the data, 635 developments have received an inspection score, a rate of just over 48 per month. While more inspections have been conducted than there have been final scores released, it will take HUD 13 years to publish an official NSPIRE score for all public housing developments at this rate if these scores are all that have been finalized. The NSPIRE Final Rule includes the caveat that NSPIRE scores will not be used to determine Public Housing Assessment System (PHAS) designations until all developments associated with a PHA have received an NSPIRE inspection, so delays in finalizing scores could result in a serious delay in PHAS measuring what it purports to.5 As we saw earlier, it is not impossible for developments to “slip through the cracks” and go beyond three years without an inspection, so NAHRO will watch this metric in future data publications.

Figure 5: The number of inspections performed per month and recorded in published data by protocol.

Conclusion

It is “too early to call” exactly how NSPIRE score patterns will change from UPCS in the long term. So far scores look to be an improvement for agencies that have had both kinds of inspection, HUD does appear to be valuing the units the most, Small Rural agencies are seeing similar outcomes, and HUD is either ramping up inspections slower than expected or is slow to publish scores.


Data and Methodology Appendix

UPCS and NSPIRE data from HUD were used to determine NSPIRE and UPCS inspection scores from 2005 to 2024. The data identified developments, inspections, and agencies through unique ID numbers. Corresponding inspection scores, inspection dates, and geographical information were also provided. A deduplicated version of this data, encompassing all publicly available inspections, was obtained from the Public and Affordable Housing Research Coalition. The deduplicated data included 41,281 UPCS inspections and 635 NSPIRE Inspections. Final UPCS scores and Final NSPIRE scores were collected by identifying the most recent inspection date and score for each development. Due to the recent implementation of these standards, most developments did not have an NSPIRE score.

Finding #1 Appendix: NSPIRE scores are generally an improvement over those same properties’ most recent UPCS inspections

As mentioned in this section, there were three ways to evaluate the change to NSPIRE scores: all UPCS vs. all NSPIRE scores, the final UPCS score recorded for each project vs. all NSPIRE scores, or the final UPCS score vs. final NSPIRE score for projects that received both types of inspection. The results for method 1 and 2 are below.

Method 1: all 41,000-plus UPCS scores vs. NSPIRE scores

With data going back to 2005, we can take a broad view of physical inspections. In Figure 6 below, each dot represents the final numerical score of one physical inspection.

Figure 6: Over time, the average physical inspections score has fallen as public housing funding has required housing authorities to do more with less funding

If you find the scatterplot above overwhelming, consider the average score by year in Figure 7 below.

Figure 7: The average score of all inspections done in each year, 2008 to 2024

Comparing NSPIRE scores against all UPCS scores means that you are comparing the portfolio in 2023-2024 to scores from more than a decade ago, many of which are no longer the most recent.

Method 2: every development’s latest UPCS score vs. NSPIRE scores

While this would avoid overweighting UPCS scores by counting multiple for each property versus just one NSPIRE score, it still considers thousands projects that were inspected under UPCS but not NSPIRE. It is important that NSPIRE scoring is not random—they are checking the oldest and lowest final UPCS-inspected projects first, so this comparison would measure the best performers under UPCS but not necessarily NSPIRE.

To evaluate this method, we performed a two-sample t-Test assuming unequal variances. This allowed us to consider the difference between the average of all final UPCS scores and the average of all NSPIRE scores and then to gauge whether the difference is statistically significant. It was significant. See the output below in Figure 8.

t-Test: Two-Sample Assuming Unequal Variances
Average of Last UPCS ScoreNSPIRE Score
Mean78.9932773.76535
Variance289.9041424.5426
Observations8326635
Hypothesized Mean Difference0
df702
t Stat6.233483
P(T<=t) one-tail3.93E-10
t Critical one-tail1.647027
P(T<=t) two-tail7.87E-10
t Critical two-tail1.963349 
Figure 8: Difference of Means All Final UPCS Scores vs. All NSPIRE scores

As referenced in the paper, the third method resulted in a positive 13 point difference from UPCS to NSPIRE for the properties that received both types of inspection. The output is below in Figure 9.

t-Test: Paired Two Sample for Means
Final UPCS ScoreNSPIRE Score
Mean60.5007973.7401
Variance278.1806425.8942
Observations631631
Pearson Correlation0.394453
Hypothesized Mean Difference0
df630
t Stat-15.9908
P(T<=t) one-tail7.33E-49
t Critical one-tail1.647276
P(T<=t) two-tail1.47E-48
t Critical two-tail1.963737 
Figure 9: Difference of means test output between final UPCS score for properties that received NSPIRE inspection and NSPIRE inspection

Finally, the regression output showing the relationship between final UPCS score and NSPIRE score is below.

Regression Statistics
Multiple R0.394453
R Square0.155593
Adjusted R Square0.15425
Standard Error18.97893
Observations631
ANOVA
 dfSSMSFSignificance F
Regression141747.6341747.63115.90136.36E-25
Residual629226565.7360.1999
Total630268313.4   
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept44.211462.8449915.540112.5E-4638.6246349.7982938.6246349.79829
Final UPCS Score0.488070.04533510.765756.36E-250.3990430.5770970.3990430.577097
Figure 10: Regressing Final NSPIRE Score on Final UPCS Score

Finding #2 Appendix: HUD really will fail agencies with too many significant deficiencies in living units

Another view of the proliferation of scores equaling exactly 59 is to consider a difference of means test. The output is listed below.

Regression Statistics
Multiple R0.146537
R Square0.021473
Adjusted R Square0.02145
Standard Error0.096173
Observations41916
ANOVA
 dfSSMSFSignificance F
Regression18.5072768.507276919.77427.2E-200
Residual41914387.67560.009249
Total41915396.1828   
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept0.0077760.00047316.427641.88E-600.0068480.0087040.0068480.008704
PROTOCOL BINARY 1=NSPIRE0.1166330.00384630.327787.2E-2000.1090960.1241710.1090960.124171
Figure 11: The output of regressing a score binary variable (1 = scored exactly 59, 0 = scored other number of points) on a protocol binary variable (1 = NSPIRE used, 0 = UPCS used)

Finding #3 Appendix: For agencies with the Small Rural designation, the first NSPIRE scores have a different distribution with relatively fewer scores falling in the middle

Again, it is hard to make definitive claims about how NSPIRE is affecting Small Rural agencies when REAC has only performed a handful of these new inspections. However, when you take a broad view of how Small Rural agencies compare with the universe of public housing developments—Small Rural and otherwise—it becomes clear that the trends seen in all developments generally hold for small agencies, though with a smaller sample there tend to be fewer poor score outliers. Figure 12 shows that since UPCS inspections resumed after the COVID-19 hiatus in particular, Small Rural development inspections have generally mirrored others in both UPCS and NSPIRE. This trend includes a proliferation of scores equaling 59 and fewer scores in the 60s, likely one of the major drivers in the new score distribution.

Figure 12: All UPCS and NSPIRE scores plotted by date for both Small Rural and Non-Small Rural developments

Footnotes

[1] Economic Growth Regulatory Relief and Consumer Protection Act: Implementation of National Standards for the Physical Inspection of Real Estate (NSPIRE). U.S. Department of Housing and Urban Development (HUD). https://www.federalregister.gov/documents/2023/05/11/2023-09693/economic-growth-regulatory-relief-and-consumer-protection-act-implementation-of-national-standards

[2] Physical Inspection Scores. U.S. Department of Housing and Urban Development (HUD). https://www.huduser.gov/portal/datasets/pis.html, and Physical Inspection Scores by State for Public Housing. U.S. Department of Housing and Urban Development (HUD). https://www.hud.gov/program_offices/public_indian_housing/reac/products/prodpass/phscores

[3] National Standards for the Physical Inspection of Real Estate and Associated Protocols, Scoring Notice. U.S. Department of Housing and Urban Development (HUD). https://www.federalregister.gov/documents/2023/07/07/2023-14362/national-standards-for-the-physical-inspection-of-real-estate-and-associated-protocols-scoring

[4] Notice PIH 2023-33(HA). U.S. Department of Housing and Urban Development (HUD). https://www.hud.gov/sites/dfiles/PIH/documents/PIH%20Notice%202023-33_Small%20Rural%20PH.pdf

[5] Economic Growth Regulatory Relief and Consumer Protection Act: Implementation of National Standards for the Physical Inspection of Real Estate (NSPIRE). U.S. Department of Housing and Urban Development (HUD). https://www.hud.gov/sites/dfiles/PIH/documents/NSPIREFinalRuleMay112023.pdf

More Articles in this Issue

key Member Content
Volume 81, Issue 2

Fall/Winter 2024

As one administration comes to an end and a new one begins, it is timely to consider:…