-
Notifications
You must be signed in to change notification settings - Fork 182
Replace use of AppCache PROJECT section with strongly-typed structures #1415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace use of AppCache PROJECT section with strongly-typed structures #1415
Conversation
I am in favor of the contract handler interface if we are going to generalize this approach for other sections (which I think is wise BTW). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cycy has already been testing this and it is working well. I like the idea of using a contract handler interface...
We will need to get quez to change the URL with an appcache entry for the Einstein project. |
src/neuralnet/project.cpp
Outdated
//! The internal implementation uses this class to sort containers of \c Project | ||
//! instances. \c std::sort() cannot sort a collection of \c const objects. | ||
//! | ||
struct MutableProject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need MutableProject. If we only need it for the LowercaseName
we can use boost::iequals and do it on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on a way to get rid of MutableProject
. It mainly exists because I declared the member fields of Project
as const
, so std::sort()
can't operate on the vector because it wants to move the values into memory marked immutable as it rearranges the items. LowercaseName()
might speed up the sort comparison because each project name is compared multiple times, but we sort the set so infrequently that it doesn't really matter...would like to get rid of the extra struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be an idiotic question but why do the member fields need to be const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They...probably don't. I wanted to guarantee that calling code can't modify the contract data in a Project
. The interface of WhitelistSnapshot
seems to prevent this from happening. Need to find some time to verify that. Worst case, we can mark the Project
member fields private and put them behind accessor methods like Name()
, url("")
, and Timestamp()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather do that, and then you can do a sort on the vector.
//! \param url Project URL from contract message value. | ||
//! \param ts Contract timestamp. | ||
//! | ||
Project(const std::string name, const std::string url, const int64_t ts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass non-POD arguments by reference or some compilers will create copies, see https://godbolt.org/z/ARmoPE. Unless that's what we want here due to thread synchronization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admittedly, I don't fully understand pass-by-value, pass-by-reference, copy, and move semantics in C++ yet. I'm following the guidelines from here:
- If the function intends to change the argument as a side effect, take it by non-const reference.
- If the function doesn't modify its argument and the argument is of primitive type, take it by value.
- Otherwise take it by const reference, except in the following cases
- If the function would then need to make a copy of the const reference anyway, take it by value.
Based on this advice, I think I need to pass by value because Project
takes ownership of those arguments. We already have to create copies of these values somewhere up the call stack when we extract the strings from the transaction message, so passing by value here enables the compiler to optimize the call by moving the string arguments instead of copying them. If we pass references, the compiler can't use move semantics because the reference indicates that the caller might need to hold onto the value, so it performs another copy.
I see the extra allocation in the assembly from the tool you linked, but I can't seem to construct an example to test the above... the compiler seems to optimize the entire thing out. Sorry for being dense--still trying to learn this stuff. Are the recommendations that I'm following reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pass references, the compiler can't use move semantics because the reference indicates that the caller might need to hold onto the value, so it performs another copy.
Bloody hell, you're correct. https://www.codesynthesis.com/~boris/blog/2012/06/19/efficient-argument-passing-cxx11-part1/. You live, you learn. It even looks like a modern compiler (gcc-8) will do the std::move
for you while an older one (gcc-6.3) requires some guidance from the developer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, are you sure you can pass it as const if its content shall be moved? Looking at godbolt gcc-8 seems to figure it out and moves/optimizes it anyway whereas gcc-6.3 still creates a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@denravonska From what I understand, a const
parameter doesn't block the move optimization. I spent way too much time trying to find information about this because it's rather confusing, and I couldn't find anything definitive in the case of const
. I ended up building a small program to try to figure this out. Here's the way that I see it--I might be entirely wrong, so please let me know if I've made any mistakes or false judgements:
Let's simplify the the signature for the Project
constructor so we're not talking about std::string
:
Project(const Name project_name);
We now initialize Project
by passing it a Name
object, which might hold a bunch of data that's expensive to copy. Here's the key point that took me a long time to realize: the compiler needs to allocate project_name
before it gives it to the constructor of Project
, so the decision to move or copy happens before binding the object to the project_name
parameter no matter if it's const
or not.
For moving to work, Name
needs an implicitly or explicitly defined move constructor that takes an rvalue reference from another instance of the same type:
Name(const NameClass& from); // copy constructor, vs:
Name(NameClass&& from); // move constructor
As an intermediate step, the compiler creates a Name
value using one of these constructors from the passed value before binding it to the function. If a valid move constructor exists, the compiler can choose to use it when the value received by Project()
is an rvalue reference, like a return value from a function:
Name GetProjectName()
{
Name project_name;
project_name.value = "einstein@home";
return project_name; // returns an rvalue reference to project_name
}
Project project(GetProjectName());
As shown above, the compiler needs to construct a Name
object from the return value of GetProjectName()
to pass to the Project
constructor. Normally, return-value optimization kicks in, so an extra instantiation isn't needed, but if it doesn't, the next best choice is move construction.
This doesn't work for more common lvalues:
Name project_name = GetProjectName();
Project project(project_name); // copy!
Here, the return value is assigned to a variable, so it becomes an lvalue. If we try to pass that to the Project
constructor, the compiler does a normal copy. Traditionally, it seems like C++ gets around this by using references:
Project::Project(const Name& project_name) : m_name(project_name) { ... }
By passing a reference to the constructor, the compiler doesn't add an immediate copy operation. However, because Project
needs to take ownership of the value of project_name
, we have to copy the value behind the reference into the m_name
member field anyway.
It seems that C++11 provides the std::move()
utility to work around these cases. The function converts an lvalue variable into an xvalue for functions that take rvalues to enable move construction (value...value...value...whew!).
Name project_name = GetProjectName();
Project project(std::move(project_name)); // project_name resources moved here
Our previous discussion above about Project
's internal const
fields illustrates how const
does block moves by preventing the compiler from moving values into and out of the object's member fields. But for function parameters, the move occurs before the object binds to the const
parameter.
The Whitelist
data structures don't really benefit from move optimizations because their copy-on-write design requires exactly the opposite functionality, but I want to make sure I understand these concepts for any other code that I write. Similarly, moving doesn't work below--the compiler falls back to a copy of the const
variable:
const Name project_name = GetProjectName();
Project project(std::move(project_name)); // copy!
CC: @jamescowens - would like your eyes on this too for fact checking 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philipswift No claims of sanity were made here 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fail to find cases where passing by const value makes sense. By-value will create copy, no matter const, unless the function is inlined. But const pointer or const reference is good and useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your last example, @cyrossignol , std::move turns & into &&. But it does not change const-ness. Since there is no ctor(const T&&), && decays to & and ctor(const T&) is used. Which is copy ctor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing non-const value makes sense for values smaller than pointer and for values that will be moved inside the function. But move operation is not free either. Referencing is cheapest.
Impact of all this is reduced for code that is all in one translation unit. Because then the compiler may use (partial) inlining and do these optimizations for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @tomasbrod -- I appreciate you taking the time to explain...this was one of the more confusing parts about learning C++ for me, and I'm finally starting to understand it.
During my testing, I was thrown-off by what appeared to be the compiler optimizing-away the copy for some (non-trivial) const
by-value arguments, but I noticed it doesn't happen every time. Also, it seems as if moving into a const argument works just fine because there's often a compiler-generated move constructor. It's just for moving out of a const
value that the compiler should perform a copy, but I think I observed that it doesn't always (maybe when it can avoid it).
I asked @denravonska about it on Slack:
All this makes sense to me so far, and I've settled on the belief that we can't use
const
if we want moves to work. There seems to be a conflict between const-correctness and move semantics. I'd like to know your opinion: is one more important than the other in the design of good C++ code? It makes sense to declare a parameterconst
if the function should not modify it, but we could cause an unwanted copy because the compiler really isn't supposed to move out of a const value. On the other hand, is it a bad form to omitconst
just to enable moves? Do you think there's a best practice?
...and he gave me similar advice: as a rule-of-thumb, non-const
by-value, or const
by-ref.
So, based on both of your suggestions, I'll write functions with:
- non-
const
by-value parameters when the function needs to copy to take ownership anyway const&
parameters otherwise- mutable references or pointers only where strictly necessary
I've updated the code in this PR based on your recommendations. We can review in the next iteration.
src/neuralnet/project.h
Outdated
//! | ||
//! \brief Smart pointer around a collection of projects. | ||
//! | ||
typedef std::shared_ptr<std::vector<Project>> ProjectListPtr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a good idea to also typedef std::vector<Project> ProjectList
for readability later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that--will add to the next version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, similar to the kind of stuff I did in the scraper...
const std::string& value, | ||
const int64_t& timestamp) | ||
{ | ||
if (type == "project" || type == "projectmapping") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is projectmapping
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an old contract type name used before "project"... I didn't see any instances of that one on testnet... need to scan mainnet--if it doesn't exist there either, we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Carried over from the old code.... probably vestigial.
return Baseurl("") + "stats/"; | ||
} | ||
|
||
return Baseurl("") + "stats/" + type + ".gz"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong assumption. Project stats urls are not required to adhere to .../stats/type.gz .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed--until we change the format of URLs in the project contracts, I'm just doing it similarly to the way the old code does.
//! data with snapshots until the application mutates the data by adding or | ||
//! removing a project. During mutation, a \c Whitelist object copies its data | ||
//! and stores that behind a new smart pointer as an atomic operation, so any | ||
//! existing \c WhitelistSnapshot instances become obsolete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case of mutable whitelist snapshot? Only admin message processing should modify whitelist. And that happens under mutex lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A snapshot is immutable. The comment about mutation applies only to the step that loads contracts from transaction messages. I'm trying to make it easier and faster to access the data by avoiding the lock on a mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was me, I would KISS and just copy the structure to scraper when it needs to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tomasbrod. I'm trying to KISS for the calling code that actually uses the project data. By abstracting the the details behind these classes, consumers don't need to worry about synchronization or efficient memory management. I'm working on beacon stuff that needs project data, and we could present it in other areas or features as well. The project data changes very infrequently, so I couldn't justify copying it every time something uses it 😄
shared_ptr contain mutex, therefore it is not lock free I appreciate this patch. It will be useful to have Project and Whitelist data structures for future changes to reward calculation. Not limited to the NN namespace. |
Are you sure that is true Tomas? From https://en.cppreference.com/w/cpp/memory/shared_ptr All member functions (including copy constructor and copy assignment) can be called by multiple threads on different instances of shared_ptr without additional synchronization even if these instances are copies and share ownership of the same object. If multiple threads of execution access the same shared_ptr without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur; the shared_ptr overloads of atomic functions can be used to prevent the data race. If their was an implicit mutex then why would all of those cautions be necessary, and why the overloads of atomic functions? |
@tomasbrod The information that I read while trying to figure out how to do this suggests that the implementation of |
Anyway I agree with @tomasbrod this a hell of an improvement over the old code and will be a good basis for extension. @cyrossignol let's get this cleaned up so we can move towards a merge. |
Start adding tests to the whitelist manager
I think I've addressed all the reviews. Thanks to @denravonska, we have tests for these changes. I filled in the gaps there as requested. I'll push the move to a contract handler interface after this. The scope is a little more involved, so it will go into a separate PR. |
@cyrossignol I don't necessarily think the PR merge needs to wait on this, but listdata needs to be fixed up to use the new structure for keytype "project" so that listdata continues to work regardless of whether in the old appcache structure or new. This will have to be extended each time we move a section out of the appcache. |
@cyrossignol, @denravonska and I discussed this and we have agreed that we actually should retire the listdata call as we peel off sections into their own structures. This means we should implement separate calls for each of the former "sections". You had a whitelist test rpc call that you removed that does this perfectly for this first section, project. I would recommend renaming it to getprojects. I think we should stick with the naming convention get where is the old section name. Please do this and I will merge the PR. |
Added: - Replace NeuralNetwork with portable C++ scraper #1387 (@jamescowens, @tomasbrod, @Cycy, @TheCharlatan, @denravonska). - Allow compile flags to be used for depends #1423 (@G-UK). - Add stake splitting and side staking info to getmininginfo #1424 (@jamescowens). - Add freedesktop.org desktop file and icon set #1438 (@a123b). Changed: - Disable Qt for windows Travis builds #1276 (@TheCharlatan). - Replace use of AppCache PROJECT section with strongly-typed structures #1415 (@cyrossignol). - Change dumpwallet to use appropriate data directory #1416 (@jamescowens). - Optimize ExtractXML() calls by avoiding unnecessary string copies #1419 (@cyrossignol). - Change signature of IsLockTimeWithinMinutes #1422 (@jamescowens). - Restore old poll output for getmininginfo RPC #1437 (@a123b). - Prevent segfault when using rpc savescraperfilemanifest #1439 (@jamescowens). - Improve miner status messages for ineligible staking balances #1447 (@cyrossignol). - Enhance scraper log archiving #1449 (@jamescowens). Fixed: - Re-enable full GUI 32-bit Windows builds - part of #1387 (@jamescowens). - Re-activate Windows Installer #1409 (@TheCharlatan). - Fix Depends and Travis build issues for ARM #1417 (@jamescowens). - Fix syncupdate icons #1421 (@jamescowens). - Fix potential BOINC crash when reading projects #1426 (@cyrossignol). - Fix freeze when unlocking wallet #1428 (@denravonska). - Fix RPC after high priority alert #1432 (@denravonska). - Fix missing poll in GUI when most recent poll expired #1455 (@cyrossignol). Removed: - Remove old, rudimentary side staking implementation #1381 (@denravonska). - Remove auto unlock #1402 (@denravonska). - Remove superblock forwarding #1430 (@denravonska).
To ease access to project whitelist data, these changes introduce new data structures in the neural network namespace and a lock-free API for storing and loading that data. The PR removes the PROJECT section from the AppCache and updates any routines which accessed the data in that section to use the new API.
This lays some groundwork for upcoming changes that need thread-safe access to the whitelist data. It also improves the performance and thread-safety of the scraper by avoiding whitelist copies and by removing the need to synchronize access to the AppCache for that information. As a side benefit, the implementation removes some old project URL rules that became obsolete with the new scraper.
This is possibly one in a series of updates that move the project away from the use of the AppCache--a global, generic pool of string data--to more meaningful types with a greater degree of thread-safety and less overhead.