So last year I started writing that if I was to implement asymmetric passwords right now, I would use Argon2id for key derivation, Ed25519 for signatures, and Ristretto255 for the OPRF. I haven't really reviewed those choices since, but they are probably still good.
Argon2id output size must match the private key size of Ed25519: 256 bits.
The other Argon2id parameters can just be left configurable in the implementation, but there are some universally applicable ideas:
You are competing with the insecurity of just sending the plain password to the server. Any parameters are profoundly more secure than that. The private key derived with this Argon2id run never leaves your client and is only around in memory momentarily. The server only sees that private key's signatures and public key.
The key UX consideration is how much time it takes and how much it bogs down the user's device from doing anything else useful while logging in. Unfortunately many users in the world still have fairly low-spec hardware, and you are competing with the speed of sending their password with plain text.
The recommended Argon2 tuning order is to raise memory as much as you can - the whole point of Argon2 is to be memory hard to take away any cost advantage from GPU, FPGA, and ASIC crackers - then raise the parallelism, then raise the iterations.
So that lets us figure out the best parameters that can be used across all devices with some minimum specs that we want to support while still having an acceptably fast login experience on all of them.
And obviously for users who know to seek out those settings, we could also let users raise their account's minimum Argon2id parameters higher than the default minimum.
But can we do even better?
In principle we could always have higher parameters than our secure minimum while keeping acceptably fast logins if the user's device has the specs for it. This just adds a small risk of a slow login if they later try to log in from an atypically low-spec or constrained system.
So why not just automate that process? First try to get larger allocations until the system refuses, while also doing a microbenchmark to test if using that much memory at once hits a slowdown. Then do a microbenchmark to see when speedup from parallelism drops off. You might be able to combine these two. Now start running an Argon2id implementation that gives you some way to choose at each iteration whether to stop or continue, and stop once you're less than one iteration's duration away from your maximum acceptable login time.
This is getting better, but what do you do for users who log in from devices with significantly different specs? Maybe they dropped and broke their phone so they're using an old temp phone for a while. Or they normally log in from a high-end gaming rig but today they're at grandma's and hopping on her budget computer to get something done.
When a user logs in, we have to use the saved parameters to check their login, otherwise Argon2id would give us a different private key and the Ed25519 public key wouldn't match the saved one.
But if the measured best parameters are different enough, then after logging in successfully we could start the creation of a new entry as a background lower-priority task.
Unless the user leaves in about a login's worth of time right after logging in, that will finish and we can save that so the login uses the best settings next time.
Okay but what about churn if you use multiple devices regularly? What about needlessly sending a less secured key after throwing away more secure parameter entry? What about needlessly spending more time logging in after throwing away a lower parameter entry?
We could actually have more than one entry of Argon2id parameters, salt, and public key for a user. So long as the minimum parameters meet our security bar, the only downside is a tiny risk of a weakness in the composition of Ed25519 deriving public keys from separate private keys that result from independently randomly salted and differently parameterized Argon2id runs somehow leaking statistical information about the password.
Each entry can be expired once it hasn't been needed for a login for too long - or we could even make the code pluggable, since this is a cache eviction problem and there are many different schemes for that, maybe someone knows a better one.
It feels vulnerable to a downgrade attack, but it isn't meaningfully so, because if you didn't have this scheme then you would just have to pick parameters that are secure enough - so just use those parameters as your minimums.
That seems pretty great, but there's a really sucky part to this scheme: both the login experience and the implementation suffer a lot when you are logging in from a much weaker system than any remembered entry. And I don't just mean it would be slow, I mean it might require your Argon2id implementation to handle cases like "every device until now let us use 2GiB of memory and this one's refusing to give us more than 256MiB, so now we have to compute the same result in an eighth of the space".
(Aside: this is a good example of why "limit this program to [amount] memory" is not really the right resource control knob a lot of the time if the implementation is "refuse to let the program address more memory" rather than "use swap space if it tries to have more memory" - code that is willing to give up speed to fit within your memory constraints shouldn't have to reimplement memory page swapping, and if you are not willing to give it that opportunity then that would be a different knob.)
One way to mitigate that sucky edge case:
Since you have an acceptable minimum parameterization of Argon2id anyway, just keep an entry for that permanently.
This is worse in the event of a server compromise than only having stronger parameter entries, but if you didn't have the dynamicism and multiple entries you would still always have this lowest secure parameters entry on the server,
It's still better than just having the lowest parameters in the absence of server secret storage compromise or downgrade attack, because you send a public key and signature corresponding to a private key derived from the password with better parameters.
So I've gone back and forth in my mind about this scheme being worth it or not. There is something really nice about users' password security organically automatically upgrading as their technology upgrades. On the other hand it is very complex, would require a lot of reimplemention of some of the most security-sensitive and "don't write this yourself unless you're a cryptographer and computer security expert!" parts of the system, with more cleverness and complexity than existing implementations.
I think the tie breaker for me is that this is all probably very temporary and maybe already obsolete. The future is not users logging in by typing in passwords that they remember. Asymmetric passwords is a great way to make that better, but that's going away. The future is your device knowing cryptographic keys for each login for you, and maybe one password securing the encrypted secret storage where those keys live.
The recent passkey announcements and standardization was a good reminder of that.
Over a year ago I did some testing and found that 128MiB, 1 lane, and 3 iterations was tolerably slow on some realistically old or weak phones and pretty great on new high-end ones. That's lower than you'd use for server-side password hashing, but I'd rather have the security increase of never sending the password to the server. So you could probably dial that up by approximately a year's worth of typical user technology spec improvement and still get a great login experience range.
That's probably good enough until the passkey stuff takes over entirely.