What makes a good appraiser? What is an appraiser? Why should I care? There's a whole branch of statistics called estimation theory that deals with these questions, and we're not going to cover it fairly in one blog post. However, modern impact assessment has come a long way in recent years and we are happy to share some of the methods we use in an upcoming post. This will serve as a gentle introduction to the subject and a basis for understanding what makes some of these modern appraisers so fascinating.
Let's start with the endpoint - that's the thing you want to know, which you hope to calculate with the data. This is sometimes also called an estimate. An estimator is a function that takes the observed data and assigns it to a number. this number is often called an estimate. The estimator estimates the target parameter. You interact with estimators all the time without even thinking about it - mean, median, mode, min, max, etc. For example, suppose we want to calculate the mean of a certain distribution from a sample. There are many estimators we can use. We could use the first observation, the median, the sample mean, or something even fancier. Which one should we use? Intuitively, the first option seems pretty silly, but how do you choose between the other three? Fortunately, there are ways to characterize the relative strengths and weaknesses of these approaches.
First, let's consider what characteristics we might look for in an ideal assessment partner, in non-technical terms:
“An open-minded scientist looking for an estimator to get reasonably accurate point estimates with small confidence intervals. Prefer reasonable performance in small samples. The positive thing is the lack of reaction to misunderstandings."
Anyway, that's what we often want. This leads to one of the first interesting points about estimator selection: we may have different priorities depending on the situation, and thus the best estimator may be context-dependent and subject to human judgment. If we have a large sample, we may not care about the properties of a small sample. If we know we are working with noisy data, we can prioritize estimators that are not easily affected by outliers.
Let us now present the properties of estimators in a more formal way and discuss how they fit our more informal needs. We will call the estimator for a given sample size n as \(t_n\) and the true endpoint of interest as \(\theta\).
The expectation of the estimator is equal to the parameter of interest:
\[E[t_n] = \theta\]This seems reasonable - we would like our estimator to estimate the correct one, although we are sometimes willing to compensate for bias and variance.
\(t_n\) is consistent if it converges to the true value of \(\theta\) as more and more observations are acquired. This refers to a specific type of convergence (probability convergence) defined as:
This is sometimes called "weak convergence" because we do not say that the limit of \(t_n\) is \(\theta\). When the probability converges, the sampling distribution becomes more and more concentrated around the parameter as the sample size increases.
Oooooh. So what is the difference between impartiality and consistency? They sound the same, what's the difference? A quick example illustrates this very well.
Suppose you are trying to calculate the population mean (\(\mu\)) of a distribution. Here are two possible estimators you can try:
-
First observation, \(X_1\)
-
Sum of observations divided by (n+100), \(\frac{\sum X_i}{n + 100}\)
\(E[X_1] = E[X_i] = \mu\), so the estimator is unbiased! However, this seems to be an intuitively poor mean estimator, probably because your gut tells you that it is not consistent. Taking larger and larger samples does not make us more confident that we are close to the mean.
On the other hand, the latter estimator is clearly biased:\(\begin{align} E \left[ \frac{\sum X_i}{n + 100} \right] &= \frac{1}{n+100} \sum E[X_i] \\ &= \left(\frac{n}{n+100} \right) \mu \\ &\neq \mu \\\end{align}\)
But! As n increases \(\frac{n}{n+100} \right arrow 1\). Thus, the second estimator is consistent but unbiased (in fact it is asymptotically unbiased). In this regard, it appears that consistency may be more important than unbiasedness if you have a large enough sample (Figure 1). Both consistency and fairness meet our need for an estimator that is "reasonably accurate."
The differentiating feature even among consistent estimators may be the speed of their convergence in probability. You can have two estimators, estimator A and estimator B, which are consistent. However, their rate of convergence may be completely different. Assuming all other things are equal, we would prefer the estimator that converges faster, as this provides greater reliability for infinite sample sizes, which is the reality most of us live in (Figure 2).
We say that an estimator is asymptotically normal if, as the sample size approaches infinity, the distribution of the difference between the estimator and the true value of the target parameter becomes more and more normally distributed.
This probably brings to mind the Central Limit Theorem (CLT). In one of its most basic forms, CLT describes the behavior of the sum (and thus the mean) of independent and identically distributed (iid) random variables. Suppose you have an iid sample \(X_1, ...X_n\) of a random variable \(X\) distributed in an unknown distribution D with finite mean m and finite variance v (i.e. X ~ D(m, v )) . The CLT sample mean (\(\bar{X}\)) tells us that \(\sqrt{n} (\bar{X} - m)\) converges with probability to \(Normal(0,v ) \) . We can change this to show that \(\bar{X} \rightarrow Normal(m, v/n)\). A particular version of CLT also applies to many other estimators, where the difference between the estimator \(t_n\) and the true value of the target parameter \(\theta\) converges to a normal distribution centered at 0 with some finite variance , i.e. scaled by sample size, n.
You may not have even known you wanted this property at the appraiser, but this is exactly what you need (wow, that dating profile analogy is really picking up steam here). Remember how you wanted small confidence intervals? Asymptotic normality is the basis for using standard closed-form formulas for confidence intervals. We can use the patternestimate +/- 1.96*standard_error
to create confidence intervals because we know we have a normal marginal distribution, which allows us to derive a multiplier based on the standard error (1.96 for a 95% confidence interval).
Asymptotic normality is not necessarily required to characterize the uncertainty – for some estimators there are non-parametric approaches that do not rely on this assumption. In fact, the asymptotic normality depends not only on the estimator, but also on the data generation process and the end parameter. Some target parameters do not have asymptotically normal estimators. However, in cases where asymptotic normality applies, its use will usually result in confidence intervals that are smaller than their nonparametric counterparts. Isn't it nice to have one that you can do with a pencil?
An appraiser is said to be "effective" if it achieves:Cramer-Rao lower bound, which is the theoretical minimum achievable variance given the inherent variability of the random variable itself. This is for parametric estimators, the non-parametric crew have different words for it, but the general idea is the same. let's not get caught up in this particular argument. In practice, we are often concerned with relative efficiency, that is, whether one estimator is more efficient (ie has less variance) than another.
Let's take a step back and recall the standard formula for a confidence interval one second ago (estimate +/- 1.96*standard_error
). Asymptotic normality allowed us to construct this symmetric interval. From normal percentages we got 1.96. The estimate is the number we got from our appraiser. But where does the standard error in this formula come from?
The standard error we use in the confidence interval has three main factors:
- number of observations, n
- inherent variation in the data generation process itself, e.g. Var(X)
- the variance associated with the particular estimator used, e.g. Var(\(t_n\)). Remember that the variance of functions of random variables is different from the variance of the random variables themselves.
We want the standard error to be small as this gives smaller confidence intervals. There is usually very little we can do about the variability of a random variable - it is. And if the data is already collected, we cannot choose n. But what we do have control over is the choice of estimator, and a good choice in this case can lead to a smaller overall standard error, giving us smaller confidence intervals. (Note: this is one of the highlights of this entire blog post)
Suppose we have a sample of data from a normal distribution and we want to estimate the mean of this distribution. Consider two probabilities: the sample mean and the sample median. Both are unbiased and consistent estimators of the population mean (since we have assumed that the population is normal and therefore symmetric, population mean = population median). Figure 3 shows that for any sample size, the median is a less efficient estimator than the mean, that is, estimates from repeated samples have a wider median spread.
The mean/mean comparison is a fairly trivial example to illustrate the concept of efficiency, but in practice we rarely choose between the two estimators. A more relevant example is the difference between an A/B test analysis using a traditional t-test approach versus a regression model that includes baseline (before treatment) covariates. The motivation for using a regression model to analyze a purely randomized experiment is to reduce variance. If our rating is affected by many factors, each of them increases the variability of the rating. When we include some of these factors as covariates, they help absorb some of the overall variability in the outcome, which can make it easier to see the treatment effect.
Suppose we have a simple A/B test with a randomly assigned treatment and two pre-treatment covariates that affect the outcome. We could choose whether to analyze the data using a sample mean difference approach or a regression model that includes these two known pretreatment covariates. Figure 4 shows the estimates and confidence intervals from 1,000 of these simulated trials. Both methods give valid confidence intervals that cluster around the true baseline effect, but the confidence intervals for this simulation were more than 6 times larger for the sample mean approach compared to the regression approach.
In this game example, we had an unrealistically easy data generation process to model. There was nothing complex, non-linear or interactive about the data generation process, so the most obvious specification of the regression model was correct. In a randomized experiment, the model does not necessarily need to be perfectly specified to benefit from variance reduction to some extent. However, if the treatment is not randomized, model misspecification can lead to inconsistent estimates of the treatment effect.
Strength is defined more broadly than some of the previous properties. A robust estimator is not unduly influenced by violations of assumptions about the data or the data generation process. Robust estimators are often (but not always) less efficient than their unstable counterparts for well-behaved data, but provide greater confidence in the consistency of data that deviate from our expectations.
A classic example is a scenario where we take small samples from a skewed distribution, which can produce outliers. In this case, the mean can be a poor estimator of central tendency, as it can be strongly affected by outliers, especially in a small sample. In contrast, the median, which is considered a robust estimator, will not be affected by outliers.
Another example concerns the assumption we often make, namely that the variance associated with a particular variable is the same across all observations (homoscedasticity). This is often the case in the context of regression, where we assume that the values \(\epsilon_i\) in \(y_i = \beta X_i + \epsilon_i\) are independent and identically distributed. If the latter condition is not met (eg, the variance of epsilon depends on \(X_i\)), the standard approach to estimating the standard errors for the \(\beta\) coefficients will be incorrect and possibly overestimated optimistic. TheHubera-White'a lub Sandwicha estimatorThe standard error is a robust estimator designed to provide a consistent estimate under these conditions, which typically results in larger confidence intervals. Safety first!
Now we know what an end parameter is (what you want to calculate from the data), what an estimator is (a function that maps a sample to a value), and what properties we can look for in the estimator. We also discussed how and where we use these properties.
Let's go back to the appraiser's original request and translate the day-to-day and technical aspects:
- "reasonably accurate point estimates" \(\rightarrow\) consistent
- "small confidence intervals" \(\rightarrow\) asymptotically normal, high relative efficiency
- "does not respond to disputes about reality" \(\rightarrow\) compact
Last question: why should you care? Assessment is essentially a search for the truth, we don't do it for fun. In your search for truth, it's important to know what trade-offs you're making and whether they're reasonable trade-offs in the context of your data. Remember that rendering for these properties is a tango between the estimator and the data generation process, which means you need to know something about the data you're working with to make good decisions. You should also know what you are leaving on the table that you can have for free. If you have pretreatment covariates that predict the outcome, why use the mean as an estimator and live with larger confidence limits than necessary? We have only scratched the surface here. In a future post we will be happy to talk about double robust estimation which offers advantages in terms of consistency, efficiency and robustness.
Come work with us!
We're a diverse team dedicated to building great products, and we'd appreciate your help. Want to build amazing products with amazing peers? Come with us!
FAQs
What technology does stitch fix use? ›
Stitch Fix has been using artificial intelligence since before it was cool—years before the onset of the AI fever dream sweeping retail today.
What does stitch fix match score measure? ›This match score is a complex function of the history between that client and stylist (if any), and the affinities between the client's stated and latent style preferences and those of the stylist.
What drivers of innovation have influenced stitch fix? ›Innovation of Stitch Fix:
A unique mix of Data Science and Human Judgement: While data is unarguably the crux of the Stitch Fix business model, so are humans. While algorithms provide the initial filters, ultimately, those human stylists have the power to override the product assortment the algorithm has delivered.
Having all this data means Stitch Fix's machine learning algorithms can match the right products with the right clients. The algorithms figure out how much the characteristics of particular items - color, material, style, etc. - matter to each customer and a fashion box is curated from the results.
What makes Stitch Fix unique? ›The Stitch Fix business model is a unique approach to online retail that uses data and machine learning to provide personalized clothing recommendations to its customers.
Does Stitch Fix actually use stylists? ›At Stitch Fix, our mission is to transform the way people find what they love. We combine technology with the personalized touch of an expert stylist to deliver a one-of-a-kind shopping experience. We like to call it our unique blend of art and science—and it works.
What is the competitive advantage of Stitch Fix? ›Stitch Fix's competitive advantage best helps with item discovery. Customers may look to re-purchase some products, which favors Wave 1 e-commerce companies that optimize for price and convenience.
What is the competition for Stitch Fix? ›Subscription Required | Styling Fee Cost (per box) | |
---|---|---|
DAILY LOOK | YES | $40 |
TRUNK CLUB | NO | $25 |
LE TOTE | YES | $69 - $79 monthly subscription fee |
WANTABLE | NO | $20 |
The average price target for Stitch Fix is $4.31. This is based on 11 Wall Streets Analysts 12-month price targets, issued in the past 3 months. The highest analyst price target is $5.00 ,the lowest forecast is $3.00. The average price target represents 14.63% Increase from the current price of $3.76.
What are the four key innovations? ›The Four Key Elements of Innovation: Collaboration, Ideation, Implementation and Value Creation. Innovation requires collaboration, ideation, implementation and value creation.
What problem did Stitch Fix solve? ›
In her first conference call with analysts in 2017 after the apparel box retailer went public, founder and then-chief executive Katrina Lake explained how her idea — using a combination of algorithms and human stylists to help customers find clothes — is more convenient than shopping at physical stores and solves the “ ...
Is Stitch Fix successful? ›Stitch Fix sold $730 million worth of clothing in 2016 and $977 million worth in 2017. One hundred percent of our revenue results directly from our recommendations, which are the core of our business. We have more than 2 million active clients in the United States, and we carry more than 700 brands.
Who is the target customer for Stitch Fix? ›The primary target customer of Stich Fix are college graduate, careered women in their 20's to 40's located all over America. The secondary target are the women who are not necessarily career focused but are willing to dress fashionably as well as ladies who are not fashion savvy but would like to dress trendy.
Is Stitch Fix making money? ›Key Points. Stitch Fix lost $0.19 per share in the first quarter of 2023. That continues a string of red ink for the online personal styling service that started in 2021. Although Stitch Fix is losing money, its balance sheet shows that it still has plenty of room to run.
Does Stitch Fix use aws? ›Stitch Fix, which sends personalized boxes of apparel -- or Fixes -- to its customers, outsources much of its technology infrastructure to Amazon Web Services.
Does Stitch Fix use bots to style? ›The future of generative AI at Stitch Fix
We're continuously experimenting with generative AI, whether it's to better understand trends and predict the styles that will be popular with our clients in the future; or to help deepen personal connections between stylists and clients through predictive text.
At Stitch Fix, we have more than 100 data scientists who have created several dozens of algorithmic capabilities, generating 100s of millions of dollars in benefits.
Is Stitch Fix ethically sourced? ›At Stitch Fix, all of our exclusive brands are committed to being more sustainable, including our 41Hawthorn and Market & Spruce collections. We work with vendors who uphold local labor laws and international standards on worker and human rights.