Creative way of coding
Administration use case
As a Salesforce administrator, I want to identify the fields that have not been used since a long time.My object is
- hosting millions of records
- heavily customized
I want to do this for multiple objects
Constraints
I need to- Optimize the network
I don't want to download gigabytes of data over the network - Optimize the API
I have limited calls per 24h - Automate the process
I will not do this manually, it has to be automated - Optimize duration
I should not wait too much to get the information
Large Date Volume orgs are difficult to query
- Timeouts can occur frequently
- Can hit the limit of long running queries
The way you are using the API can impact performance: using http/1.1, KeepAlive, gzip compression. This is out of scope of this blog.
Reducing the scope
We will not analyze:- Formula fields
- Non nilable fields (checked as required)
- Checkboxes (true or false, no capability to identify if it has been set)
- System fields
We must configure the Date field to be used; CreatedDate or LastModifiedDate? Impact on migration on those fields?
Solution 1 - download + process
Use any tool such as Dataloader. Parse locally the resulting CSV file.Impacts:
+ Moderate API usage
- Long time to prepare the file on server side
- Long time to download the file
- Downloading everything even if the first records tells you all fields are used.
Waste of resource and time.
Solution 2 - API + continuous process
Use the API to run your query. Do a queryMore() until you find the last used date for all fields. You receive your records in bulk of BatchSize. The more you query fields, the smaller your BatchSize is: 250 fields can drive to a batchsize of 200 records while 3 fields can give you 2000 records in a batchImpacts
+ Optimizing the number of records retrieved as you can stop the process as soon as you have the information for all your fields
- Lots of API consumption (small batch size, huge number of records)
- Long duration because lots of API round trips + long initial query
Comparing both solutions
Solution 3 - refined query
Use the API to run your query. Do a queryMore() until you find the last used date for all fields, or do a new query() with less fields if you found some fields- The BatchSize will increase progressively
- The quantity of records per roundtrip will increase.
- The total duration will decrease compared to Solution 2
Impacts
+ Optimizing the number of records retrieved as you can stop the process as soon as you have the information for all your fields
+ Less API calls than Solution 2
+ Quicker compared to Solution 2 as you increase the throughput
- Still long initial query, and some when refining the query
The code for Solution 3 is available here in the Salesforce Playground
Comparing all solutions:
Solution 4 - server processing
Same query optimization as Solution 3. Executed as ApexAnonymous on the ServerImpacts
+ Capability to process much more records in 1 roundtrip. Theorically up to 50k, but limited to CPU time -> less than 10k records
+ Immediate query because of the "limit 10k". No need to wait for the snapshot on the DB server
+ Optimized network. Only results are transmitted, not the data
+ Optimized API, because of the huge batch size
Technical tips to develop Solution 4
Use ApexAnonymous; no need to deploy/inject anything in the org- executeanonymous() from SOAP API
Very efficient, retrieve Apex debug logs as the result - executeAnonymous using the tooling API. Requires lots of API calls
- POST to /tooling/sobjects/traceFlag to define the debug log level
- Execute the anonymous Apex code
- Get the Id of the last log from the ApexLog tooling object
- retrieve the debug log content by querying the ApexLog object Body
How to get the result of Apex processing on client side? The Apex Code needs to prepare a JSON dump using system.debug(). The client need to retrieve the debug logs, filter the custom debug statements to regenerate the JSON, then use it.
Apex Code is dynamically autogenerated to have the progressive query enhancement. Removing progressively from the query the fields that have been found
The code for Solution 4 is available here in the Salesforce Playground
Synthesis of all solutions, number 4 is the best one.