Skip to content

Conversation

cyrossignol
Copy link
Member

To ease access to project whitelist data, these changes introduce new data structures in the neural network namespace and a lock-free API for storing and loading that data. The PR removes the PROJECT section from the AppCache and updates any routines which accessed the data in that section to use the new API.

This lays some groundwork for upcoming changes that need thread-safe access to the whitelist data. It also improves the performance and thread-safety of the scraper by avoiding whitelist copies and by removing the need to synchronize access to the AppCache for that information. As a side benefit, the implementation removes some old project URL rules that became obsolete with the new scraper.

This is possibly one in a series of updates that move the project away from the use of the AppCache--a global, generic pool of string data--to more meaningful types with a greater degree of thread-safety and less overhead.

@jamescowens
Copy link
Member

I am in favor of the contract handler interface if we are going to generalize this approach for other sections (which I think is wise BTW).

Copy link
Member

@jamescowens jamescowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cycy has already been testing this and it is working well. I like the idea of using a contract handler interface...

@jamescowens
Copy link
Member

We will need to get quez to change the URL with an appcache entry for the Einstein project.

@jamescowens jamescowens added this to the Denise milestone Mar 9, 2019
//! The internal implementation uses this class to sort containers of \c Project
//! instances. \c std::sort() cannot sort a collection of \c const objects.
//!
struct MutableProject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need MutableProject. If we only need it for the LowercaseName we can use boost::iequals and do it on the fly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on a way to get rid of MutableProject. It mainly exists because I declared the member fields of Project as const, so std::sort() can't operate on the vector because it wants to move the values into memory marked immutable as it rearranges the items. LowercaseName() might speed up the sort comparison because each project name is compared multiple times, but we sort the set so infrequently that it doesn't really matter...would like to get rid of the extra struct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be an idiotic question but why do the member fields need to be const?

Copy link
Member Author

@cyrossignol cyrossignol Mar 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They...probably don't. I wanted to guarantee that calling code can't modify the contract data in a Project. The interface of WhitelistSnapshot seems to prevent this from happening. Need to find some time to verify that. Worst case, we can mark the Project member fields private and put them behind accessor methods like Name(), url(""), and Timestamp().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather do that, and then you can do a sort on the vector.

//! \param url Project URL from contract message value.
//! \param ts Contract timestamp.
//!
Project(const std::string name, const std::string url, const int64_t ts);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass non-POD arguments by reference or some compilers will create copies, see https://godbolt.org/z/ARmoPE. Unless that's what we want here due to thread synchronization.

Copy link
Member Author

@cyrossignol cyrossignol Mar 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, I don't fully understand pass-by-value, pass-by-reference, copy, and move semantics in C++ yet. I'm following the guidelines from here:

  • If the function intends to change the argument as a side effect, take it by non-const reference.
  • If the function doesn't modify its argument and the argument is of primitive type, take it by value.
  • Otherwise take it by const reference, except in the following cases
    • If the function would then need to make a copy of the const reference anyway, take it by value.

Based on this advice, I think I need to pass by value because Project takes ownership of those arguments. We already have to create copies of these values somewhere up the call stack when we extract the strings from the transaction message, so passing by value here enables the compiler to optimize the call by moving the string arguments instead of copying them. If we pass references, the compiler can't use move semantics because the reference indicates that the caller might need to hold onto the value, so it performs another copy.

I see the extra allocation in the assembly from the tool you linked, but I can't seem to construct an example to test the above... the compiler seems to optimize the entire thing out. Sorry for being dense--still trying to learn this stuff. Are the recommendations that I'm following reasonable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pass references, the compiler can't use move semantics because the reference indicates that the caller might need to hold onto the value, so it performs another copy.

Bloody hell, you're correct. https://www.codesynthesis.com/~boris/blog/2012/06/19/efficient-argument-passing-cxx11-part1/. You live, you learn. It even looks like a modern compiler (gcc-8) will do the std::move for you while an older one (gcc-6.3) requires some guidance from the developer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, are you sure you can pass it as const if its content shall be moved? Looking at godbolt gcc-8 seems to figure it out and moves/optimizes it anyway whereas gcc-6.3 still creates a copy.

Copy link
Member Author

@cyrossignol cyrossignol Mar 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denravonska From what I understand, a const parameter doesn't block the move optimization. I spent way too much time trying to find information about this because it's rather confusing, and I couldn't find anything definitive in the case of const. I ended up building a small program to try to figure this out. Here's the way that I see it--I might be entirely wrong, so please let me know if I've made any mistakes or false judgements:

Let's simplify the the signature for the Project constructor so we're not talking about std::string:

Project(const Name project_name);

We now initialize Project by passing it a Name object, which might hold a bunch of data that's expensive to copy. Here's the key point that took me a long time to realize: the compiler needs to allocate project_name before it gives it to the constructor of Project, so the decision to move or copy happens before binding the object to the project_name parameter no matter if it's const or not.

For moving to work, Name needs an implicitly or explicitly defined move constructor that takes an rvalue reference from another instance of the same type:

Name(const NameClass& from); // copy constructor, vs:
Name(NameClass&& from);      // move constructor

As an intermediate step, the compiler creates a Name value using one of these constructors from the passed value before binding it to the function. If a valid move constructor exists, the compiler can choose to use it when the value received by Project() is an rvalue reference, like a return value from a function:

Name GetProjectName() 
{
    Name project_name;
    project_name.value = "einstein@home";

    return project_name; // returns an rvalue reference to project_name
}

Project project(GetProjectName()); 

As shown above, the compiler needs to construct a Name object from the return value of GetProjectName() to pass to the Project constructor. Normally, return-value optimization kicks in, so an extra instantiation isn't needed, but if it doesn't, the next best choice is move construction.

This doesn't work for more common lvalues:

Name project_name = GetProjectName();
Project project(project_name); // copy!

Here, the return value is assigned to a variable, so it becomes an lvalue. If we try to pass that to the Project constructor, the compiler does a normal copy. Traditionally, it seems like C++ gets around this by using references:

Project::Project(const Name& project_name) : m_name(project_name) { ... }

By passing a reference to the constructor, the compiler doesn't add an immediate copy operation. However, because Project needs to take ownership of the value of project_name, we have to copy the value behind the reference into the m_name member field anyway.

It seems that C++11 provides the std::move() utility to work around these cases. The function converts an lvalue variable into an xvalue for functions that take rvalues to enable move construction (value...value...value...whew!).

Name project_name = GetProjectName();
Project project(std::move(project_name)); // project_name resources moved here

Our previous discussion above about Project's internal const fields illustrates how const does block moves by preventing the compiler from moving values into and out of the object's member fields. But for function parameters, the move occurs before the object binds to the const parameter.

The Whitelist data structures don't really benefit from move optimizations because their copy-on-write design requires exactly the opposite functionality, but I want to make sure I understand these concepts for any other code that I write. Similarly, moving doesn't work below--the compiler falls back to a copy of the const variable:

const Name project_name = GetProjectName();
Project project(std::move(project_name)); // copy!

CC: @jamescowens - would like your eyes on this too for fact checking 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipswift No claims of sanity were made here 🙂

Copy link
Member

@tomasbrod tomasbrod Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to find cases where passing by const value makes sense. By-value will create copy, no matter const, unless the function is inlined. But const pointer or const reference is good and useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your last example, @cyrossignol , std::move turns & into &&. But it does not change const-ness. Since there is no ctor(const T&&), && decays to & and ctor(const T&) is used. Which is copy ctor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing non-const value makes sense for values smaller than pointer and for values that will be moved inside the function. But move operation is not free either. Referencing is cheapest.
Impact of all this is reduced for code that is all in one translation unit. Because then the compiler may use (partial) inlining and do these optimizations for you.

Copy link
Member Author

@cyrossignol cyrossignol Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tomasbrod -- I appreciate you taking the time to explain...this was one of the more confusing parts about learning C++ for me, and I'm finally starting to understand it.

During my testing, I was thrown-off by what appeared to be the compiler optimizing-away the copy for some (non-trivial) const by-value arguments, but I noticed it doesn't happen every time. Also, it seems as if moving into a const argument works just fine because there's often a compiler-generated move constructor. It's just for moving out of a const value that the compiler should perform a copy, but I think I observed that it doesn't always (maybe when it can avoid it).

I asked @denravonska about it on Slack:

All this makes sense to me so far, and I've settled on the belief that we can't use const if we want moves to work. There seems to be a conflict between const-correctness and move semantics. I'd like to know your opinion: is one more important than the other in the design of good C++ code? It makes sense to declare a parameter const if the function should not modify it, but we could cause an unwanted copy because the compiler really isn't supposed to move out of a const value. On the other hand, is it a bad form to omit const just to enable moves? Do you think there's a best practice?

...and he gave me similar advice: as a rule-of-thumb, non-const by-value, or const by-ref.

So, based on both of your suggestions, I'll write functions with:

  • non-const by-value parameters when the function needs to copy to take ownership anyway
  • const& parameters otherwise
  • mutable references or pointers only where strictly necessary

I've updated the code in this PR based on your recommendations. We can review in the next iteration.

//!
//! \brief Smart pointer around a collection of projects.
//!
typedef std::shared_ptr<std::vector<Project>> ProjectListPtr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a good idea to also typedef std::vector<Project> ProjectList for readability later on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that--will add to the next version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, similar to the kind of stuff I did in the scraper...

const std::string& value,
const int64_t& timestamp)
{
if (type == "project" || type == "projectmapping") {
Copy link
Member

@tomasbrod tomasbrod Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is projectmapping?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an old contract type name used before "project"... I didn't see any instances of that one on testnet... need to scan mainnet--if it doesn't exist there either, we can remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Carried over from the old code.... probably vestigial.

return Baseurl("") + "stats/";
}

return Baseurl("") + "stats/" + type + ".gz";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong assumption. Project stats urls are not required to adhere to .../stats/type.gz .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed--until we change the format of URLs in the project contracts, I'm just doing it similarly to the way the old code does.

//! data with snapshots until the application mutates the data by adding or
//! removing a project. During mutation, a \c Whitelist object copies its data
//! and stores that behind a new smart pointer as an atomic operation, so any
//! existing \c WhitelistSnapshot instances become obsolete.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case of mutable whitelist snapshot? Only admin message processing should modify whitelist. And that happens under mutex lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A snapshot is immutable. The comment about mutation applies only to the step that loads contracts from transaction messages. I'm trying to make it easier and faster to access the data by avoiding the lock on a mutex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was me, I would KISS and just copy the structure to scraper when it needs to run.

Copy link
Member Author

@cyrossignol cyrossignol Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tomasbrod. I'm trying to KISS for the calling code that actually uses the project data. By abstracting the the details behind these classes, consumers don't need to worry about synchronization or efficient memory management. I'm working on beacon stuff that needs project data, and we could present it in other areas or features as well. The project data changes very infrequently, so I couldn't justify copying it every time something uses it 😄

@tomasbrod
Copy link
Member

tomasbrod commented Mar 14, 2019

lock-free API

shared_ptr contain mutex, therefore it is not lock free

I appreciate this patch. It will be useful to have Project and Whitelist data structures for future changes to reward calculation. Not limited to the NN namespace.

@jamescowens
Copy link
Member

Are you sure that is true Tomas? From https://en.cppreference.com/w/cpp/memory/shared_ptr

All member functions (including copy constructor and copy assignment) can be called by multiple threads on different instances of shared_ptr without additional synchronization even if these instances are copies and share ownership of the same object. If multiple threads of execution access the same shared_ptr without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur; the shared_ptr overloads of atomic functions can be used to prevent the data race.

If their was an implicit mutex then why would all of those cautions be necessary, and why the overloads of atomic functions?

@cyrossignol
Copy link
Member Author

@tomasbrod The information that I read while trying to figure out how to do this suggests that the implementation of shared_ptr uses atomics for reference counting when the target platform supports them--otherwise, it falls back to using a mutex. The data wrapped by the pointer requires its own manual synchronization if needed. I'm pretty new to C++, so maybe I misunderstood.

@jamescowens
Copy link
Member

Anyway I agree with @tomasbrod this a hell of an improvement over the old code and will be a good basis for extension. @cyrossignol let's get this cleaned up so we can move towards a merge.

@cyrossignol
Copy link
Member Author

I think I've addressed all the reviews. Thanks to @denravonska, we have tests for these changes. I filled in the gaps there as requested.

I'll push the move to a contract handler interface after this. The scope is a little more involved, so it will go into a separate PR.

@jamescowens
Copy link
Member

jamescowens commented Mar 17, 2019

@cyrossignol I don't necessarily think the PR merge needs to wait on this, but listdata needs to be fixed up to use the new structure for keytype "project" so that listdata continues to work regardless of whether in the old appcache structure or new. This will have to be extended each time we move a section out of the appcache.

@jamescowens
Copy link
Member

jamescowens commented Mar 17, 2019

@cyrossignol, @denravonska and I discussed this and we have agreed that we actually should retire the listdata call as we peel off sections into their own structures. This means we should implement separate calls for each of the former "sections". You had a whitelist test rpc call that you removed that does this perfectly for this first section, project. I would recommend renaming it to getprojects. I think we should stick with the naming convention get where is the old section name.

Please do this and I will merge the PR.

@jamescowens jamescowens merged commit d407125 into gridcoin-community:development Mar 18, 2019
denravonska added a commit that referenced this pull request May 10, 2019
Added:
 - Replace NeuralNetwork with portable C++ scraper #1387 (@jamescowens,
   @tomasbrod, @Cycy, @TheCharlatan, @denravonska).
 - Allow compile flags to be used for depends #1423 (@G-UK).
 - Add stake splitting and side staking info to getmininginfo #1424
   (@jamescowens).
 - Add freedesktop.org desktop file and icon set #1438 (@a123b).

Changed:
 - Disable Qt for windows Travis builds #1276 (@TheCharlatan).
 - Replace use of AppCache PROJECT section with strongly-typed structures #1415
   (@cyrossignol).
 - Change dumpwallet to use appropriate data directory #1416 (@jamescowens).
 - Optimize ExtractXML() calls by avoiding unnecessary string copies #1419
   (@cyrossignol).
 - Change signature of IsLockTimeWithinMinutes #1422 (@jamescowens).
 - Restore old poll output for getmininginfo RPC #1437 (@a123b).
 - Prevent segfault when using rpc savescraperfilemanifest #1439 (@jamescowens).
 - Improve miner status messages for ineligible staking balances #1447
   (@cyrossignol).
 - Enhance scraper log archiving #1449 (@jamescowens).

Fixed:
 - Re-enable full GUI 32-bit Windows builds - part of #1387 (@jamescowens).
 - Re-activate Windows Installer #1409 (@TheCharlatan).
 - Fix Depends and Travis build issues for ARM #1417 (@jamescowens).
 - Fix syncupdate icons #1421 (@jamescowens).
 - Fix potential BOINC crash when reading projects #1426 (@cyrossignol).
 - Fix freeze when unlocking wallet #1428 (@denravonska).
 - Fix RPC after high priority alert #1432 (@denravonska).
 - Fix missing poll in GUI when most recent poll expired #1455 (@cyrossignol).

Removed:
 - Remove old, rudimentary side staking implementation #1381 (@denravonska).
 - Remove auto unlock #1402 (@denravonska).
 - Remove superblock forwarding #1430 (@denravonska).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants